Dataset statistics
| Number of variables | 29 |
|---|---|
| Number of observations | 1949630 |
| Missing cells | 16662616 |
| Missing cells (%) | 29.5% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 431.4 MiB |
| Average record size in memory | 232.0 B |
Variable types
| Categorical | 20 |
|---|---|
| Unsupported | 1 |
| Numeric | 8 |
CRASH DATE has a high cardinality: 3804 distinct values | High cardinality |
CRASH TIME has a high cardinality: 1440 distinct values | High cardinality |
LOCATION has a high cardinality: 262859 distinct values | High cardinality |
ON STREET NAME has a high cardinality: 17394 distinct values | High cardinality |
CROSS STREET NAME has a high cardinality: 19731 distinct values | High cardinality |
OFF STREET NAME has a high cardinality: 202042 distinct values | High cardinality |
CONTRIBUTING FACTOR VEHICLE 1 has a high cardinality: 61 distinct values | High cardinality |
CONTRIBUTING FACTOR VEHICLE 2 has a high cardinality: 61 distinct values | High cardinality |
CONTRIBUTING FACTOR VEHICLE 3 has a high cardinality: 51 distinct values | High cardinality |
VEHICLE TYPE CODE 1 has a high cardinality: 1450 distinct values | High cardinality |
VEHICLE TYPE CODE 2 has a high cardinality: 1622 distinct values | High cardinality |
VEHICLE TYPE CODE 3 has a high cardinality: 230 distinct values | High cardinality |
VEHICLE TYPE CODE 4 has a high cardinality: 91 distinct values | High cardinality |
VEHICLE TYPE CODE 5 has a high cardinality: 63 distinct values | High cardinality |
NUMBER OF PERSONS INJURED is highly overall correlated with NUMBER OF PEDESTRIANS INJURED and 1 other fields | High correlation |
NUMBER OF PERSONS KILLED is highly overall correlated with NUMBER OF PEDESTRIANS KILLED and 2 other fields | High correlation |
NUMBER OF MOTORIST INJURED is highly overall correlated with NUMBER OF PERSONS INJURED | High correlation |
NUMBER OF MOTORIST KILLED is highly overall correlated with NUMBER OF PERSONS KILLED | High correlation |
NUMBER OF PEDESTRIANS KILLED is highly overall correlated with NUMBER OF PERSONS KILLED and 1 other fields | High correlation |
NUMBER OF CYCLIST KILLED is highly overall correlated with NUMBER OF PERSONS KILLED and 1 other fields | High correlation |
CONTRIBUTING FACTOR VEHICLE 3 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fields | High correlation |
CONTRIBUTING FACTOR VEHICLE 4 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fields | High correlation |
CONTRIBUTING FACTOR VEHICLE 5 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 4 other fields | High correlation |
LATITUDE is highly overall correlated with LONGITUDE | High correlation |
LONGITUDE is highly overall correlated with LATITUDE | High correlation |
NUMBER OF PEDESTRIANS INJURED is highly overall correlated with NUMBER OF PERSONS INJURED | High correlation |
NUMBER OF CYCLIST INJURED is highly overall correlated with VEHICLE TYPE CODE 4 and 1 other fields | High correlation |
CONTRIBUTING FACTOR VEHICLE 1 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 2 and 4 other fields | High correlation |
CONTRIBUTING FACTOR VEHICLE 2 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fields | High correlation |
COLLISION_ID is highly overall correlated with VEHICLE TYPE CODE 4 and 1 other fields | High correlation |
VEHICLE TYPE CODE 4 is highly overall correlated with NUMBER OF CYCLIST INJURED and 3 other fields | High correlation |
VEHICLE TYPE CODE 5 is highly overall correlated with NUMBER OF CYCLIST INJURED and 3 other fields | High correlation |
BOROUGH has 605465 (31.1%) missing values | Missing |
ZIP CODE has 605701 (31.1%) missing values | Missing |
LATITUDE has 224428 (11.5%) missing values | Missing |
LONGITUDE has 224428 (11.5%) missing values | Missing |
LOCATION has 224428 (11.5%) missing values | Missing |
ON STREET NAME has 405870 (20.8%) missing values | Missing |
CROSS STREET NAME has 719709 (36.9%) missing values | Missing |
OFF STREET NAME has 1636131 (83.9%) missing values | Missing |
CONTRIBUTING FACTOR VEHICLE 2 has 291964 (15.0%) missing values | Missing |
CONTRIBUTING FACTOR VEHICLE 3 has 1812944 (93.0%) missing values | Missing |
CONTRIBUTING FACTOR VEHICLE 4 has 1919228 (98.4%) missing values | Missing |
CONTRIBUTING FACTOR VEHICLE 5 has 1941480 (99.6%) missing values | Missing |
VEHICLE TYPE CODE 2 has 353893 (18.2%) missing values | Missing |
VEHICLE TYPE CODE 3 has 1817465 (93.2%) missing values | Missing |
VEHICLE TYPE CODE 4 has 1920206 (98.5%) missing values | Missing |
VEHICLE TYPE CODE 5 has 1941717 (99.6%) missing values | Missing |
LATITUDE is highly skewed (γ1 = -21.17608353) | Skewed |
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 34.8813989) | Skewed |
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 55.83766352) | Skewed |
COLLISION_ID has unique values | Unique |
ZIP CODE is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
NUMBER OF PERSONS INJURED has 1527469 (78.3%) zeros | Zeros |
NUMBER OF PERSONS KILLED has 1946976 (99.9%) zeros | Zeros |
NUMBER OF PEDESTRIANS INJURED has 1848975 (94.8%) zeros | Zeros |
NUMBER OF MOTORIST INJURED has 1678914 (86.1%) zeros | Zeros |
NUMBER OF MOTORIST KILLED has 1948606 (99.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-05 14:44:27.020096 |
|---|---|
| Analysis finished | 2022-12-05 14:54:24.926646 |
| Duration | 9 minutes and 57.91 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
CRASH DATE
Categorical
| Distinct | 3804 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 MiB |
| 01/21/2014 | 1161 |
|---|---|
| 11/15/2018 | 1065 |
| 12/15/2017 | 999 |
| 05/19/2017 | 974 |
| 01/18/2015 | 961 |
| Other values (3799) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 19496300 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 09/11/2021 |
|---|---|
| 2nd row | 03/26/2022 |
| 3rd row | 06/29/2022 |
| 4th row | 09/11/2021 |
| 5th row | 12/14/2021 |
Common Values
| Value | Count | Frequency (%) |
| 01/21/2014 | 1161 | 0.1% |
| 11/15/2018 | 1065 | 0.1% |
| 12/15/2017 | 999 | 0.1% |
| 05/19/2017 | 974 | < 0.1% |
| 01/18/2015 | 961 | < 0.1% |
| 02/03/2014 | 960 | < 0.1% |
| 03/06/2015 | 939 | < 0.1% |
| 05/18/2017 | 911 | < 0.1% |
| 01/07/2017 | 896 | < 0.1% |
| 03/02/2018 | 884 | < 0.1% |
| Other values (3794) | 1939880 |
Length
| Value | Count | Frequency (%) |
| 01/21/2014 | 1161 | 0.1% |
| 11/15/2018 | 1065 | 0.1% |
| 12/15/2017 | 999 | 0.1% |
| 05/19/2017 | 974 | < 0.1% |
| 01/18/2015 | 961 | < 0.1% |
| 02/03/2014 | 960 | < 0.1% |
| 03/06/2015 | 939 | < 0.1% |
| 05/18/2017 | 911 | < 0.1% |
| 01/07/2017 | 896 | < 0.1% |
| 03/02/2018 | 884 | < 0.1% |
| Other values (3794) | 1939880 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4442574 | |
| / | 3899260 | |
| 2 | 3582396 | |
| 1 | 3455960 | |
| 3 | 646954 | 3.3% |
| 7 | 599064 | 3.1% |
| 8 | 598135 | 3.1% |
| 6 | 588171 | 3.0% |
| 9 | 572814 | 2.9% |
| 5 | 571573 | 2.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 15597040 | |
| Other Punctuation | 3899260 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4442574 | |
| 2 | 3582396 | |
| 1 | 3455960 | |
| 3 | 646954 | 4.1% |
| 7 | 599064 | 3.8% |
| 8 | 598135 | 3.8% |
| 6 | 588171 | 3.8% |
| 9 | 572814 | 3.7% |
| 5 | 571573 | 3.7% |
| 4 | 539399 | 3.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 3899260 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 19496300 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4442574 | |
| / | 3899260 | |
| 2 | 3582396 | |
| 1 | 3455960 | |
| 3 | 646954 | 3.3% |
| 7 | 599064 | 3.1% |
| 8 | 598135 | 3.1% |
| 6 | 588171 | 3.0% |
| 9 | 572814 | 2.9% |
| 5 | 571573 | 2.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 19496300 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4442574 | |
| / | 3899260 | |
| 2 | 3582396 | |
| 1 | 3455960 | |
| 3 | 646954 | 3.3% |
| 7 | 599064 | 3.1% |
| 8 | 598135 | 3.1% |
| 6 | 588171 | 3.0% |
| 9 | 572814 | 2.9% |
| 5 | 571573 | 2.9% |
CRASH TIME
Categorical
| Distinct | 1440 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 MiB |
| 16:00 | 27307 |
|---|---|
| 17:00 | 26755 |
| 15:00 | 26661 |
| 18:00 | 24709 |
| 14:00 | 24464 |
| Other values (1435) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.7403538 |
| Min length | 4 |
Characters and Unicode
| Total characters | 9241936 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2:39 |
|---|---|
| 2nd row | 11:45 |
| 3rd row | 6:55 |
| 4th row | 9:35 |
| 5th row | 8:13 |
Common Values
| Value | Count | Frequency (%) |
| 16:00 | 27307 | 1.4% |
| 17:00 | 26755 | 1.4% |
| 15:00 | 26661 | 1.4% |
| 18:00 | 24709 | 1.3% |
| 14:00 | 24464 | 1.3% |
| 13:00 | 22703 | 1.2% |
| 9:00 | 20512 | 1.1% |
| 12:00 | 20465 | 1.0% |
| 19:00 | 20434 | 1.0% |
| 16:30 | 19724 | 1.0% |
| Other values (1430) | 1715896 |
Length
| Value | Count | Frequency (%) |
| 16:00 | 27307 | 1.4% |
| 17:00 | 26755 | 1.4% |
| 15:00 | 26661 | 1.4% |
| 18:00 | 24709 | 1.3% |
| 14:00 | 24464 | 1.3% |
| 13:00 | 22703 | 1.2% |
| 9:00 | 20512 | 1.1% |
| 12:00 | 20465 | 1.0% |
| 19:00 | 20434 | 1.0% |
| 16:30 | 19724 | 1.0% |
| Other values (1430) | 1715896 |
Most occurring characters
| Value | Count | Frequency (%) |
| : | 1949630 | |
| 0 | 1803518 | |
| 1 | 1710640 | |
| 5 | 828730 | |
| 2 | 765579 | 8.3% |
| 3 | 632346 | 6.8% |
| 4 | 508060 | 5.5% |
| 8 | 295346 | 3.2% |
| 7 | 254463 | 2.8% |
| 9 | 253914 | 2.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7292306 | |
| Other Punctuation | 1949630 | 21.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1803518 | |
| 1 | 1710640 | |
| 5 | 828730 | |
| 2 | 765579 | |
| 3 | 632346 | 8.7% |
| 4 | 508060 | 7.0% |
| 8 | 295346 | 4.1% |
| 7 | 254463 | 3.5% |
| 9 | 253914 | 3.5% |
| 6 | 239710 | 3.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1949630 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 9241936 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| : | 1949630 | |
| 0 | 1803518 | |
| 1 | 1710640 | |
| 5 | 828730 | |
| 2 | 765579 | 8.3% |
| 3 | 632346 | 6.8% |
| 4 | 508060 | 5.5% |
| 8 | 295346 | 3.2% |
| 7 | 254463 | 2.8% |
| 9 | 253914 | 2.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9241936 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| : | 1949630 | |
| 0 | 1803518 | |
| 1 | 1710640 | |
| 5 | 828730 | |
| 2 | 765579 | 8.3% |
| 3 | 632346 | 6.8% |
| 4 | 508060 | 5.5% |
| 8 | 295346 | 3.2% |
| 7 | 254463 | 2.8% |
| 9 | 253914 | 2.7% |
BOROUGH
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 605465 |
| Missing (%) | 31.1% |
| Memory size | 14.9 MiB |
| BROOKLYN | |
|---|---|
| QUEENS | |
| MANHATTAN | |
| BRONX | |
| STATEN ISLAND |
Length
| Max length | 13 |
|---|---|
| Median length | 9 |
| Mean length | 7.4601481 |
| Min length | 5 |
Characters and Unicode
| Total characters | 10027670 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | BROOKLYN |
|---|---|
| 2nd row | BROOKLYN |
| 3rd row | BRONX |
| 4th row | BROOKLYN |
| 5th row | MANHATTAN |
Common Values
| Value | Count | Frequency (%) |
| BROOKLYN | 424950 | |
| QUEENS | 360087 | |
| MANHATTAN | 305117 | |
| BRONX | 197581 | 10.1% |
| STATEN ISLAND | 56430 | 2.9% |
| (Missing) | 605465 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| brooklyn | 424950 | |
| queens | 360087 | |
| manhattan | 305117 | |
| bronx | 197581 | |
| staten | 56430 | 4.0% |
| island | 56430 | 4.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| N | 1705712 | |
| O | 1047481 | |
| A | 1028211 | |
| E | 776604 | 7.7% |
| T | 723094 | 7.2% |
| R | 622531 | 6.2% |
| B | 622531 | 6.2% |
| L | 481380 | 4.8% |
| S | 472947 | 4.7% |
| Y | 424950 | 4.2% |
| Other values (9) | 2122229 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 9971240 | |
| Space Separator | 56430 | 0.6% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 1705712 | |
| O | 1047481 | |
| A | 1028211 | |
| E | 776604 | 7.8% |
| T | 723094 | 7.3% |
| R | 622531 | 6.2% |
| B | 622531 | 6.2% |
| L | 481380 | 4.8% |
| S | 472947 | 4.7% |
| Y | 424950 | 4.3% |
| Other values (8) | 2065799 |
Space Separator
| Value | Count | Frequency (%) |
| 56430 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9971240 | |
| Common | 56430 | 0.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| N | 1705712 | |
| O | 1047481 | |
| A | 1028211 | |
| E | 776604 | 7.8% |
| T | 723094 | 7.3% |
| R | 622531 | 6.2% |
| B | 622531 | 6.2% |
| L | 481380 | 4.8% |
| S | 472947 | 4.7% |
| Y | 424950 | 4.3% |
| Other values (8) | 2065799 |
Common
| Value | Count | Frequency (%) |
| 56430 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10027670 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| N | 1705712 | |
| O | 1047481 | |
| A | 1028211 | |
| E | 776604 | 7.7% |
| T | 723094 | 7.2% |
| R | 622531 | 6.2% |
| B | 622531 | 6.2% |
| L | 481380 | 4.8% |
| S | 472947 | 4.7% |
| Y | 424950 | 4.2% |
| Other values (9) | 2122229 |
| Distinct | 124579 |
|---|---|
| Distinct (%) | 7.2% |
| Missing | 224428 |
| Missing (%) | 11.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.634495 |
| Minimum | 0 |
|---|---|
| Maximum | 43.344444 |
| Zeros | 3802 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.596817 |
| Q1 | 40.66807 |
| median | 40.721336 |
| Q3 | 40.76943 |
| 95-th percentile | 40.861976 |
| Maximum | 43.344444 |
| Range | 43.344444 |
| Interquartile range (IQR) | 0.10136 |
Descriptive statistics
| Standard deviation | 1.9113294 |
|---|---|
| Coefficient of variation (CV) | 0.047037113 |
| Kurtosis | 447.20649 |
| Mean | 40.634495 |
| Median Absolute Deviation (MAD) | 0.051212 |
| Skewness | -21.176084 |
| Sum | 70102713 |
| Variance | 3.6531799 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3802 | 0.2% |
| 40.861862 | 798 | < 0.1% |
| 40.696033 | 711 | < 0.1% |
| 40.8047 | 691 | < 0.1% |
| 40.608757 | 670 | < 0.1% |
| 40.798256 | 626 | < 0.1% |
| 40.759308 | 603 | < 0.1% |
| 40.6960346 | 587 | < 0.1% |
| 40.675735 | 505 | < 0.1% |
| 40.7606005 | 474 | < 0.1% |
| Other values (124569) | 1715735 | |
| (Missing) | 224428 | 11.5% |
| Value | Count | Frequency (%) |
| 0 | 3802 | |
| 30.78418 | 1 | < 0.1% |
| 34.783634 | 1 | < 0.1% |
| 40.4989488 | 2 | < 0.1% |
| 40.4991346 | 1 | < 0.1% |
| 40.49931 | 1 | < 0.1% |
| 40.4994787 | 1 | < 0.1% |
| 40.499659 | 1 | < 0.1% |
| 40.49971 | 1 | < 0.1% |
| 40.49984 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 43.344444 | 1 | < 0.1% |
| 42.64154 | 1 | < 0.1% |
| 42.318317 | 1 | < 0.1% |
| 42.107204 | 1 | < 0.1% |
| 41.91661 | 1 | < 0.1% |
| 41.34796 | 1 | < 0.1% |
| 41.258785 | 1 | < 0.1% |
| 41.12615 | 5 | |
| 41.12421 | 1 | < 0.1% |
| 41.061634 | 2 | < 0.1% |
| Distinct | 97170 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 224428 |
| Missing (%) | 11.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -73.764712 |
| Minimum | -201.35999 |
|---|---|
| Maximum | 0 |
| Zeros | 3802 |
| Zeros (%) | 0.2% |
| Negative | 1721400 |
| Negative (%) | 88.3% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | -201.35999 |
|---|---|
| 5-th percentile | -74.03485 |
| Q1 | -73.97504 |
| median | -73.92747 |
| Q3 | -73.866595 |
| 95-th percentile | -73.76325 |
| Maximum | 0 |
| Range | 201.35999 |
| Interquartile range (IQR) | 0.1084448 |
Descriptive statistics
| Standard deviation | 3.6108736 |
|---|---|
| Coefficient of variation (CV) | -0.048951232 |
| Kurtosis | 476.40518 |
| Mean | -73.764712 |
| Median Absolute Deviation (MAD) | 0.052816 |
| Skewness | 16.098825 |
| Sum | -1.2725903 × 108 |
| Variance | 13.038408 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3802 | 0.2% |
| -73.91282 | 717 | < 0.1% |
| -73.98453 | 697 | < 0.1% |
| -73.89063 | 689 | < 0.1% |
| -74.038086 | 672 | < 0.1% |
| -73.91243 | 651 | < 0.1% |
| -73.89686 | 601 | < 0.1% |
| -73.9845292 | 587 | < 0.1% |
| -73.882744 | 558 | < 0.1% |
| -73.89083 | 539 | < 0.1% |
| Other values (97160) | 1715689 | |
| (Missing) | 224428 | 11.5% |
| Value | Count | Frequency (%) |
| -201.35999 | 1 | < 0.1% |
| -201.23706 | 105 | |
| -89.13527 | 1 | < 0.1% |
| -86.76847 | 1 | < 0.1% |
| -79.61955 | 1 | < 0.1% |
| -79.00183 | 1 | < 0.1% |
| -76.2634 | 1 | < 0.1% |
| -76.02163 | 1 | < 0.1% |
| -74.742 | 7 | < 0.1% |
| -74.25496 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 3802 | |
| -32.768513 | 16 | < 0.1% |
| -47.209625 | 3 | < 0.1% |
| -73.66301 | 1 | < 0.1% |
| -73.70055 | 2 | < 0.1% |
| -73.700584 | 11 | < 0.1% |
| -73.7005968 | 10 | < 0.1% |
| -73.70061 | 1 | < 0.1% |
| -73.70071 | 4 | < 0.1% |
| -73.70073 | 1 | < 0.1% |
| Distinct | 262859 |
|---|---|
| Distinct (%) | 15.2% |
| Missing | 224428 |
| Missing (%) | 11.5% |
| Memory size | 14.9 MiB |
| (0.0, 0.0) | 3802 |
|---|---|
| (40.861862, -73.91282) | 685 |
| (40.608757, -74.038086) | 670 |
| (40.696033, -73.98453) | 646 |
| (40.8047, -73.91243) | 597 |
| Other values (262854) |
Length
| Max length | 25 |
|---|---|
| Median length | 24 |
| Mean length | 22.85448 |
| Min length | 10 |
Characters and Unicode
| Total characters | 39428594 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 6 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 142573 ? |
|---|---|
| Unique (%) | 8.3% |
Sample
| 1st row | (40.667202, -73.8665) |
|---|---|
| 2nd row | (40.683304, -73.917274) |
| 3rd row | (40.709183, -73.956825) |
| 4th row | (40.86816, -73.83148) |
| 5th row | (40.67172, -73.8971) |
Common Values
| Value | Count | Frequency (%) |
| (0.0, 0.0) | 3802 | 0.2% |
| (40.861862, -73.91282) | 685 | < 0.1% |
| (40.608757, -74.038086) | 670 | < 0.1% |
| (40.696033, -73.98453) | 646 | < 0.1% |
| (40.8047, -73.91243) | 597 | < 0.1% |
| (40.6960346, -73.9845292) | 587 | < 0.1% |
| (40.675735, -73.89686) | 504 | < 0.1% |
| (40.7606005, -73.9643142) | 474 | < 0.1% |
| (40.820305, -73.89083) | 467 | < 0.1% |
| (40.798256, -73.82744) | 462 | < 0.1% |
| Other values (262849) | 1716308 | |
| (Missing) | 224428 | 11.5% |
Length
| Value | Count | Frequency (%) |
| 0.0 | 7604 | 0.2% |
| 40.861862 | 798 | < 0.1% |
| 73.91282 | 717 | < 0.1% |
| 40.696033 | 711 | < 0.1% |
| 73.98453 | 697 | < 0.1% |
| 40.8047 | 691 | < 0.1% |
| 73.89063 | 689 | < 0.1% |
| 74.038086 | 672 | < 0.1% |
| 40.608757 | 670 | < 0.1% |
| 73.91243 | 651 | < 0.1% |
| Other values (221738) | 3436504 |
Most occurring characters
| Value | Count | Frequency (%) |
| 7 | 4323009 | |
| 4 | 3738369 | 9.5% |
| . | 3450404 | 8.8% |
| 3 | 3291405 | 8.3% |
| 0 | 3196746 | 8.1% |
| 9 | 2545978 | 6.5% |
| 8 | 2492909 | 6.3% |
| 6 | 2459003 | 6.2% |
| 5 | 1969952 | 5.0% |
| ( | 1725202 | 4.4% |
| Other values (6) | 10235617 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 27355982 | |
| Other Punctuation | 5175606 | 13.1% |
| Open Punctuation | 1725202 | 4.4% |
| Space Separator | 1725202 | 4.4% |
| Close Punctuation | 1725202 | 4.4% |
| Dash Punctuation | 1721400 | 4.4% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 7 | 4323009 | |
| 4 | 3738369 | |
| 3 | 3291405 | |
| 0 | 3196746 | |
| 9 | 2545978 | |
| 8 | 2492909 | |
| 6 | 2459003 | |
| 5 | 1969952 | |
| 2 | 1685841 | 6.2% |
| 1 | 1652770 | 6.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3450404 | |
| , | 1725202 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1725202 |
Space Separator
| Value | Count | Frequency (%) |
| 1725202 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1725202 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1721400 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 39428594 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 7 | 4323009 | |
| 4 | 3738369 | 9.5% |
| . | 3450404 | 8.8% |
| 3 | 3291405 | 8.3% |
| 0 | 3196746 | 8.1% |
| 9 | 2545978 | 6.5% |
| 8 | 2492909 | 6.3% |
| 6 | 2459003 | 6.2% |
| 5 | 1969952 | 5.0% |
| ( | 1725202 | 4.4% |
| Other values (6) | 10235617 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 39428594 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 7 | 4323009 | |
| 4 | 3738369 | 9.5% |
| . | 3450404 | 8.8% |
| 3 | 3291405 | 8.3% |
| 0 | 3196746 | 8.1% |
| 9 | 2545978 | 6.5% |
| 8 | 2492909 | 6.3% |
| 6 | 2459003 | 6.2% |
| 5 | 1969952 | 5.0% |
| ( | 1725202 | 4.4% |
| Other values (6) | 10235617 |
| Distinct | 17394 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 405870 |
| Missing (%) | 20.8% |
| Memory size | 14.9 MiB |
| BROADWAY | 17277 |
|---|---|
| ATLANTIC AVENUE | 15322 |
| BELT PARKWAY | 13543 |
| 3 AVENUE | 12476 |
| NORTHERN BOULEVARD | 11965 |
| Other values (17389) |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 30.523924 |
| Min length | 2 |
Characters and Unicode
| Total characters | 47121613 |
|---|---|
| Distinct characters | 75 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 6127 ? |
|---|---|
| Unique (%) | 0.4% |
Sample
| 1st row | WHITESTONE EXPRESSWAY |
|---|---|
| 2nd row | QUEENSBORO BRIDGE UPPER |
| 3rd row | THROGS NECK BRIDGE |
| 4th row | SARATOGA AVENUE |
| 5th row | MAJOR DEEGAN EXPRESSWAY RAMP |
Common Values
| Value | Count | Frequency (%) |
| BROADWAY | 17277 | 0.9% |
| ATLANTIC AVENUE | 15322 | 0.8% |
| BELT PARKWAY | 13543 | 0.7% |
| 3 AVENUE | 12476 | 0.6% |
| NORTHERN BOULEVARD | 11965 | 0.6% |
| LONG ISLAND EXPRESSWAY | 9928 | 0.5% |
| BROOKLYN QUEENS EXPRESSWAY | 9743 | 0.5% |
| FLATBUSH AVENUE | 9741 | 0.5% |
| LINDEN BOULEVARD | 9587 | 0.5% |
| QUEENS BOULEVARD | 9368 | 0.5% |
| Other values (17384) | 1424810 | |
| (Missing) | 405870 | 20.8% |
Length
| Value | Count | Frequency (%) |
| avenue | 575907 | 16.2% |
| street | 495010 | 13.9% |
| east | 146330 | 4.1% |
| boulevard | 120725 | 3.4% |
| west | 109261 | 3.1% |
| parkway | 68464 | 1.9% |
| road | 64584 | 1.8% |
| expressway | 57674 | 1.6% |
| island | 27790 | 0.8% |
| queens | 25364 | 0.7% |
| Other values (5330) | 1869278 |
Most occurring characters
| Value | Count | Frequency (%) |
| 27442233 | ||
| E | 3471281 | 7.4% |
| A | 1837349 | 3.9% |
| T | 1737082 | 3.7% |
| R | 1570485 | 3.3% |
| N | 1344582 | 2.9% |
| S | 1327418 | 2.8% |
| U | 924511 | 2.0% |
| O | 818479 | 1.7% |
| V | 805864 | 1.7% |
| Other values (65) | 5842329 | 12.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Space Separator | 27442233 | |
| Uppercase Letter | 18442420 | |
| Decimal Number | 1115475 | 2.4% |
| Lowercase Letter | 111374 | 0.2% |
| Other Punctuation | 4167 | < 0.1% |
| Open Punctuation | 2888 | < 0.1% |
| Close Punctuation | 2884 | < 0.1% |
| Dash Punctuation | 170 | < 0.1% |
| Control | 1 | < 0.1% |
| Math Symbol | 1 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 3471281 | |
| A | 1837349 | |
| T | 1737082 | |
| R | 1570485 | 8.5% |
| N | 1344582 | 7.3% |
| S | 1327418 | 7.2% |
| U | 924511 | 5.0% |
| O | 818479 | 4.4% |
| V | 805864 | 4.4% |
| L | 604948 | 3.3% |
| Other values (16) | 4000421 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 14784 | |
| r | 9802 | 8.8% |
| n | 9434 | 8.5% |
| a | 9201 | 8.3% |
| t | 8067 | 7.2% |
| s | 6849 | 6.1% |
| o | 6586 | 5.9% |
| y | 5582 | 5.0% |
| l | 5182 | 4.7% |
| d | 4295 | 3.9% |
| Other values (16) | 31592 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 253017 | |
| 3 | 126459 | |
| 2 | 124841 | |
| 4 | 105856 | |
| 5 | 103760 | |
| 6 | 90709 | 8.1% |
| 8 | 83796 | 7.5% |
| 7 | 82447 | 7.4% |
| 9 | 73625 | 6.6% |
| 0 | 70965 | 6.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3062 | |
| / | 970 | 23.3% |
| & | 61 | 1.5% |
| ' | 36 | 0.9% |
| , | 16 | 0.4% |
| # | 16 | 0.4% |
| @ | 6 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 27442233 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 2888 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 2884 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 170 |
Control
| Value | Count | Frequency (%) |
| | 1 |
Math Symbol
| Value | Count | Frequency (%) |
| > | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 28567819 | |
| Latin | 18553794 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 3471281 | |
| A | 1837349 | |
| T | 1737082 | |
| R | 1570485 | 8.5% |
| N | 1344582 | 7.2% |
| S | 1327418 | 7.2% |
| U | 924511 | 5.0% |
| O | 818479 | 4.4% |
| V | 805864 | 4.3% |
| L | 604948 | 3.3% |
| Other values (42) | 4111795 |
Common
| Value | Count | Frequency (%) |
| 27442233 | ||
| 1 | 253017 | 0.9% |
| 3 | 126459 | 0.4% |
| 2 | 124841 | 0.4% |
| 4 | 105856 | 0.4% |
| 5 | 103760 | 0.4% |
| 6 | 90709 | 0.3% |
| 8 | 83796 | 0.3% |
| 7 | 82447 | 0.3% |
| 9 | 73625 | 0.3% |
| Other values (13) | 81076 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 47121613 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 27442233 | ||
| E | 3471281 | 7.4% |
| A | 1837349 | 3.9% |
| T | 1737082 | 3.7% |
| R | 1570485 | 3.3% |
| N | 1344582 | 2.9% |
| S | 1327418 | 2.8% |
| U | 924511 | 2.0% |
| O | 818479 | 1.7% |
| V | 805864 | 1.7% |
| Other values (65) | 5842329 | 12.4% |
| Distinct | 19731 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 719709 |
| Missing (%) | 36.9% |
| Memory size | 14.9 MiB |
| 3 AVENUE | 9843 |
|---|---|
| BROADWAY | 9685 |
| 2 AVENUE | 8421 |
| 5 AVENUE | 7051 |
| 7 AVENUE | 6634 |
| Other values (19726) |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 23.181595 |
| Min length | 1 |
Characters and Unicode
| Total characters | 28511530 |
|---|---|
| Distinct characters | 76 |
| Distinct categories | 12 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 5958 ? |
|---|---|
| Unique (%) | 0.5% |
Sample
| 1st row | 20 AVENUE |
|---|---|
| 2nd row | DECATUR STREET |
| 3rd row | EAST 43 STREET |
| 4th row | EAST GATE PLAZA |
| 5th row | west 80 street -west 81 street |
Common Values
| Value | Count | Frequency (%) |
| 3 AVENUE | 9843 | 0.5% |
| BROADWAY | 9685 | 0.5% |
| 2 AVENUE | 8421 | 0.4% |
| 5 AVENUE | 7051 | 0.4% |
| 7 AVENUE | 6634 | 0.3% |
| 8 AVENUE | 6580 | 0.3% |
| 3 AVENUE | 6126 | 0.3% |
| BROADWAY | 5680 | 0.3% |
| 1 AVENUE | 5318 | 0.3% |
| PARK AVENUE | 4847 | 0.2% |
| Other values (19721) | 1159736 | |
| (Missing) | 719709 |
Length
| Value | Count | Frequency (%) |
| avenue | 538295 | 19.8% |
| street | 438533 | 16.1% |
| east | 107028 | 3.9% |
| west | 68702 | 2.5% |
| boulevard | 65125 | 2.4% |
| road | 52885 | 1.9% |
| place | 32356 | 1.2% |
| parkway | 25218 | 0.9% |
| 3 | 18036 | 0.7% |
| park | 16679 | 0.6% |
| Other values (5427) | 1357973 |
Most occurring characters
| Value | Count | Frequency (%) |
| 14042126 | ||
| E | 2798427 | 9.8% |
| T | 1386719 | 4.9% |
| A | 1350774 | 4.7% |
| R | 1092357 | 3.8% |
| N | 1022407 | 3.6% |
| S | 943233 | 3.3% |
| U | 739542 | 2.6% |
| V | 674549 | 2.4% |
| O | 549889 | 1.9% |
| Other values (66) | 3911507 | 13.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Space Separator | 14042126 | |
| Uppercase Letter | 13389512 | |
| Decimal Number | 1022350 | 3.6% |
| Lowercase Letter | 57207 | 0.2% |
| Other Punctuation | 296 | < 0.1% |
| Dash Punctuation | 27 | < 0.1% |
| Open Punctuation | 3 | < 0.1% |
| Close Punctuation | 3 | < 0.1% |
| Control | 2 | < 0.1% |
| Math Symbol | 2 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 2798427 | |
| T | 1386719 | |
| A | 1350774 | |
| R | 1092357 | 8.2% |
| N | 1022407 | 7.6% |
| S | 943233 | 7.0% |
| U | 739542 | 5.5% |
| V | 674549 | 5.0% |
| O | 549889 | 4.1% |
| L | 415765 | 3.1% |
| Other values (16) | 2415850 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 10693 | |
| t | 5976 | |
| a | 5625 | |
| r | 4693 | 8.2% |
| n | 4052 | 7.1% |
| s | 3760 | 6.6% |
| o | 2730 | 4.8% |
| v | 2670 | 4.7% |
| u | 2344 | 4.1% |
| l | 2046 | 3.6% |
| Other values (16) | 12618 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 226026 | |
| 2 | 120574 | |
| 3 | 112276 | |
| 4 | 92232 | |
| 5 | 92165 | |
| 8 | 81356 | 8.0% |
| 7 | 81189 | 7.9% |
| 6 | 80691 | 7.9% |
| 9 | 70153 | 6.9% |
| 0 | 65688 | 6.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 124 | |
| . | 66 | |
| ' | 51 | |
| & | 49 | 16.6% |
| ? | 3 | 1.0% |
| , | 3 | 1.0% |
Space Separator
| Value | Count | Frequency (%) |
| 14042126 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 27 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 3 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 3 |
Control
| Value | Count | Frequency (%) |
| | 2 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 2 |
Other Symbol
| Value | Count | Frequency (%) |
| � | 1 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ` | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 15064811 | |
| Latin | 13446719 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 2798427 | |
| T | 1386719 | |
| A | 1350774 | |
| R | 1092357 | 8.1% |
| N | 1022407 | 7.6% |
| S | 943233 | 7.0% |
| U | 739542 | 5.5% |
| V | 674549 | 5.0% |
| O | 549889 | 4.1% |
| L | 415765 | 3.1% |
| Other values (42) | 2473057 |
Common
| Value | Count | Frequency (%) |
| 14042126 | ||
| 1 | 226026 | 1.5% |
| 2 | 120574 | 0.8% |
| 3 | 112276 | 0.7% |
| 4 | 92232 | 0.6% |
| 5 | 92165 | 0.6% |
| 8 | 81356 | 0.5% |
| 7 | 81189 | 0.5% |
| 6 | 80691 | 0.5% |
| 9 | 70153 | 0.5% |
| Other values (14) | 66023 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 28511529 | |
| Specials | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 14042126 | ||
| E | 2798427 | 9.8% |
| T | 1386719 | 4.9% |
| A | 1350774 | 4.7% |
| R | 1092357 | 3.8% |
| N | 1022407 | 3.6% |
| S | 943233 | 3.3% |
| U | 739542 | 2.6% |
| V | 674549 | 2.4% |
| O | 549889 | 1.9% |
| Other values (65) | 3911506 | 13.7% |
Specials
| Value | Count | Frequency (%) |
| � | 1 |
| Distinct | 202042 |
|---|---|
| Distinct (%) | 64.4% |
| Missing | 1636131 |
| Missing (%) | 83.9% |
| Memory size | 14.9 MiB |
| 772 EDGEWATER ROAD | 402 |
|---|---|
| 110-00 ROCKAWAY BOULEVARD | 261 |
| 2800 VICTORY BOULEVARD | 236 |
| 2655 RICHMOND AVENUE | 169 |
| 2100 BARTOW AVENUE | 167 |
| Other values (202037) |
Length
| Max length | 40 |
|---|---|
| Median length | 40 |
| Mean length | 37.427258 |
| Min length | 8 |
Characters and Unicode
| Total characters | 11733408 |
|---|---|
| Distinct characters | 84 |
| Distinct categories | 12 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 157896 ? |
|---|---|
| Unique (%) | 50.4% |
Sample
| 1st row | 1211 LORING AVENUE |
|---|---|
| 2nd row | 344 BAYCHESTER AVENUE |
| 3rd row | 2047 PITKIN AVENUE |
| 4th row | 480 DEAN STREET |
| 5th row | 878 FLATBUSH AVENUE |
Common Values
| Value | Count | Frequency (%) |
| 772 EDGEWATER ROAD | 402 | < 0.1% |
| 110-00 ROCKAWAY BOULEVARD | 261 | < 0.1% |
| 2800 VICTORY BOULEVARD | 236 | < 0.1% |
| 2655 RICHMOND AVENUE | 169 | < 0.1% |
| 2100 BARTOW AVENUE | 167 | < 0.1% |
| 501 GATEWAY DRIVE | 164 | < 0.1% |
| PARKING LOT 110-00 ROCKAWAY BOULEVARD | 150 | < 0.1% |
| 625 ATLANTIC AVENUE | 145 | < 0.1% |
| 450 FLATBUSH AVENUE | 145 | < 0.1% |
| 3 AVENUE | 142 | < 0.1% |
| Other values (202032) | 311518 | 16.0% |
| (Missing) | 1636131 |
Length
| Value | Count | Frequency (%) |
| avenue | 124198 | 11.9% |
| street | 112028 | 10.7% |
| east | 29581 | 2.8% |
| west | 21488 | 2.1% |
| boulevard | 20193 | 1.9% |
| road | 14814 | 1.4% |
| lot | 7881 | 0.8% |
| parking | 7267 | 0.7% |
| of | 6872 | 0.7% |
| parkway | 6198 | 0.6% |
| Other values (26931) | 695441 |
Most occurring characters
| Value | Count | Frequency (%) |
| 6606625 | ||
| E | 716201 | 6.1% |
| T | 391510 | 3.3% |
| A | 370198 | 3.2% |
| R | 306759 | 2.6% |
| N | 270609 | 2.3% |
| S | 256593 | 2.2% |
| 1 | 248197 | 2.1% |
| U | 183288 | 1.6% |
| O | 173041 | 1.5% |
| Other values (74) | 2210387 | 18.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Space Separator | 6606625 | |
| Uppercase Letter | 3715937 | |
| Decimal Number | 1300144 | 11.1% |
| Dash Punctuation | 74130 | 0.6% |
| Lowercase Letter | 22371 | 0.2% |
| Other Punctuation | 9568 | 0.1% |
| Open Punctuation | 2311 | < 0.1% |
| Close Punctuation | 2300 | < 0.1% |
| Modifier Symbol | 16 | < 0.1% |
| Connector Punctuation | 3 | < 0.1% |
| Other values (2) | 3 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 716201 | |
| T | 391510 | |
| A | 370198 | |
| R | 306759 | |
| N | 270609 | 7.3% |
| S | 256593 | 6.9% |
| U | 183288 | 4.9% |
| O | 173041 | 4.7% |
| V | 171574 | 4.6% |
| L | 130718 | 3.5% |
| Other values (16) | 745446 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 3738 | |
| t | 2639 | |
| r | 2068 | |
| a | 1942 | 8.7% |
| n | 1479 | 6.6% |
| s | 1457 | 6.5% |
| o | 1175 | 5.3% |
| v | 979 | 4.4% |
| d | 903 | 4.0% |
| l | 883 | 3.9% |
| Other values (16) | 5108 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 6426 | |
| & | 1740 | 18.2% |
| . | 997 | 10.4% |
| @ | 145 | 1.5% |
| , | 82 | 0.9% |
| : | 59 | 0.6% |
| # | 54 | 0.6% |
| ' | 50 | 0.5% |
| * | 8 | 0.1% |
| ? | 3 | < 0.1% |
| Other values (2) | 4 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 248197 | |
| 2 | 168844 | |
| 0 | 146872 | |
| 3 | 132733 | |
| 5 | 131574 | |
| 4 | 116028 | |
| 6 | 94784 | 7.3% |
| 7 | 92718 | 7.1% |
| 8 | 87497 | 6.7% |
| 9 | 80897 | 6.2% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 2299 | |
| ] | 1 | < 0.1% |
Control
| Value | Count | Frequency (%) |
| 1 | ||
| | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 6606625 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 74130 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 2311 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ` | 16 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 3 |
Math Symbol
| Value | Count | Frequency (%) |
| = | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 7995100 | |
| Latin | 3738308 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 716201 | |
| T | 391510 | |
| A | 370198 | |
| R | 306759 | 8.2% |
| N | 270609 | 7.2% |
| S | 256593 | 6.9% |
| U | 183288 | 4.9% |
| O | 173041 | 4.6% |
| V | 171574 | 4.6% |
| L | 130718 | 3.5% |
| Other values (42) | 767817 |
Common
| Value | Count | Frequency (%) |
| 6606625 | ||
| 1 | 248197 | 3.1% |
| 2 | 168844 | 2.1% |
| 0 | 146872 | 1.8% |
| 3 | 132733 | 1.7% |
| 5 | 131574 | 1.6% |
| 4 | 116028 | 1.5% |
| 6 | 94784 | 1.2% |
| 7 | 92718 | 1.2% |
| 8 | 87497 | 1.1% |
| Other values (22) | 169228 | 2.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11733408 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 6606625 | ||
| E | 716201 | 6.1% |
| T | 391510 | 3.3% |
| A | 370198 | 3.2% |
| R | 306759 | 2.6% |
| N | 270609 | 2.3% |
| S | 256593 | 2.2% |
| 1 | 248197 | 2.1% |
| U | 183288 | 1.6% |
| O | 173041 | 1.5% |
| Other values (74) | 2210387 | 18.8% |
| Distinct | 28 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 18 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29388617 |
| Minimum | 0 |
|---|---|
| Maximum | 43 |
| Zeros | 1527469 |
| Zeros (%) | 78.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 43 |
| Range | 43 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6854355 |
|---|---|
| Coefficient of variation (CV) | 2.3323163 |
| Kurtosis | 51.490826 |
| Mean | 0.29388617 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.3243749 |
| Sum | 572964 |
| Variance | 0.46982183 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1527469 | |
| 1 | 327457 | 16.8% |
| 2 | 61714 | 3.2% |
| 3 | 20165 | 1.0% |
| 4 | 7568 | 0.4% |
| 5 | 2954 | 0.2% |
| 6 | 1200 | 0.1% |
| 7 | 528 | < 0.1% |
| 8 | 218 | < 0.1% |
| 9 | 117 | < 0.1% |
| Other values (18) | 222 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1527469 | |
| 1 | 327457 | 16.8% |
| 2 | 61714 | 3.2% |
| 3 | 20165 | 1.0% |
| 4 | 7568 | 0.4% |
| 5 | 2954 | 0.2% |
| 6 | 1200 | 0.1% |
| 7 | 528 | < 0.1% |
| 8 | 218 | < 0.1% |
| 9 | 117 | < 0.1% |
| Value | Count | Frequency (%) |
| 43 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 32 | 1 | < 0.1% |
| 31 | 1 | < 0.1% |
| 27 | 1 | < 0.1% |
| 24 | 3 | |
| 22 | 3 | |
| 20 | 2 | < 0.1% |
| 19 | 4 | |
| 18 | 5 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 31 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.001400288 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 1946976 |
| Zeros (%) | 99.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.039450084 |
|---|---|
| Coefficient of variation (CV) | 28.172837 |
| Kurtosis | 2101.6931 |
| Mean | 0.001400288 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 34.881399 |
| Sum | 2730 |
| Variance | 0.0015563092 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1946976 | |
| 1 | 2542 | 0.1% |
| 2 | 65 | < 0.1% |
| 3 | 11 | < 0.1% |
| 4 | 3 | < 0.1% |
| 8 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| (Missing) | 31 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1946976 | |
| 1 | 2542 | 0.1% |
| 2 | 65 | < 0.1% |
| 3 | 11 | < 0.1% |
| 4 | 3 | < 0.1% |
| 5 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 8 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 4 | 3 | < 0.1% |
| 3 | 11 | < 0.1% |
| 2 | 65 | < 0.1% |
| 1 | 2542 | 0.1% |
| 0 | 1946976 |
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.05384355 |
| Minimum | 0 |
|---|---|
| Maximum | 27 |
| Zeros | 1848975 |
| Zeros (%) | 94.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 27 |
| Range | 27 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.23824339 |
|---|---|
| Coefficient of variation (CV) | 4.4247341 |
| Kurtosis | 127.42717 |
| Mean | 0.05384355 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.7195638 |
| Sum | 104975 |
| Variance | 0.056759913 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1848975 | |
| 1 | 96974 | 5.0% |
| 2 | 3251 | 0.2% |
| 3 | 332 | < 0.1% |
| 4 | 55 | < 0.1% |
| 5 | 23 | < 0.1% |
| 6 | 11 | < 0.1% |
| 7 | 3 | < 0.1% |
| 9 | 2 | < 0.1% |
| 27 | 1 | < 0.1% |
| Other values (3) | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1848975 | |
| 1 | 96974 | 5.0% |
| 2 | 3251 | 0.2% |
| 3 | 332 | < 0.1% |
| 4 | 55 | < 0.1% |
| 5 | 23 | < 0.1% |
| 6 | 11 | < 0.1% |
| 7 | 3 | < 0.1% |
| 8 | 1 | < 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 27 | 1 | < 0.1% |
| 15 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 9 | 2 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7 | 3 | < 0.1% |
| 6 | 11 | < 0.1% |
| 5 | 23 | < 0.1% |
| 4 | 55 | < 0.1% |
| 3 | 332 |
NUMBER OF PEDESTRIANS KILLED
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 MiB |
| 0 | |
|---|---|
| 1 | 1359 |
| 2 | 12 |
| 6 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1949630 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1949630 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1949630 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1949630 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1948258 | |
| 1 | 1359 | 0.1% |
| 2 | 12 | < 0.1% |
| 6 | 1 | < 0.1% |
NUMBER OF CYCLIST INJURED
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 MiB |
| 0 | |
|---|---|
| 1 | 48289 |
| 2 | 516 |
| 3 | 21 |
| 4 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1949630 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1949630 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1949630 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1949630 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1900803 | |
| 1 | 48289 | 2.5% |
| 2 | 516 | < 0.1% |
| 3 | 21 | < 0.1% |
| 4 | 1 | < 0.1% |
NUMBER OF CYCLIST KILLED
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 MiB |
| 0 | |
|---|---|
| 1 | 200 |
| 2 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1949630 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1949630 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1949630 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1949630 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1949429 | |
| 1 | 200 | < 0.1% |
| 2 | 1 | < 0.1% |
| Distinct | 28 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.21231875 |
| Minimum | 0 |
|---|---|
| Maximum | 43 |
| Zeros | 1678914 |
| Zeros (%) | 86.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 43 |
| Range | 43 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.64677755 |
|---|---|
| Coefficient of variation (CV) | 3.0462574 |
| Kurtosis | 64.177399 |
| Mean | 0.21231875 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.2008158 |
| Sum | 413943 |
| Variance | 0.4183212 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1678914 | |
| 1 | 182107 | 9.3% |
| 2 | 56524 | 2.9% |
| 3 | 19562 | 1.0% |
| 4 | 7416 | 0.4% |
| 5 | 2907 | 0.1% |
| 6 | 1159 | 0.1% |
| 7 | 504 | < 0.1% |
| 8 | 210 | < 0.1% |
| 9 | 113 | < 0.1% |
| Other values (18) | 214 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1678914 | |
| 1 | 182107 | 9.3% |
| 2 | 56524 | 2.9% |
| 3 | 19562 | 1.0% |
| 4 | 7416 | 0.4% |
| 5 | 2907 | 0.1% |
| 6 | 1159 | 0.1% |
| 7 | 504 | < 0.1% |
| 8 | 210 | < 0.1% |
| 9 | 113 | < 0.1% |
| Value | Count | Frequency (%) |
| 43 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 31 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 24 | 3 | |
| 22 | 2 | < 0.1% |
| 21 | 1 | < 0.1% |
| 20 | 2 | < 0.1% |
| 19 | 3 | |
| 18 | 5 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.00056728713 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 1948606 |
| Zeros (%) | 99.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.025974572 |
|---|---|
| Coefficient of variation (CV) | 45.787347 |
| Kurtosis | 4260.525 |
| Mean | 0.00056728713 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 55.837664 |
| Sum | 1106 |
| Variance | 0.0006746784 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1948606 | |
| 1 | 960 | < 0.1% |
| 2 | 50 | < 0.1% |
| 3 | 11 | < 0.1% |
| 4 | 2 | < 0.1% |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1948606 | |
| 1 | 960 | < 0.1% |
| 2 | 50 | < 0.1% |
| 3 | 11 | < 0.1% |
| 4 | 2 | < 0.1% |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 5 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| 3 | 11 | < 0.1% |
| 2 | 50 | < 0.1% |
| 1 | 960 | < 0.1% |
| 0 | 1948606 |
| Distinct | 61 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5919 |
| Missing (%) | 0.3% |
| Memory size | 14.9 MiB |
| Unspecified | |
|---|---|
| Driver Inattention/Distraction | |
| Failure to Yield Right-of-Way | |
| Following Too Closely | |
| Backing Unsafely | |
| Other values (56) |
Length
| Max length | 53 |
|---|---|
| Median length | 43 |
| Mean length | 19.383688 |
| Min length | 1 |
Characters and Unicode
| Total characters | 37676287 |
|---|---|
| Distinct characters | 55 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Aggressive Driving/Road Rage |
|---|---|
| 2nd row | Pavement Slippery |
| 3rd row | Following Too Closely |
| 4th row | Unspecified |
| 5th row | Unspecified |
Common Values
| Value | Count | Frequency (%) |
| Unspecified | 676629 | |
| Driver Inattention/Distraction | 384021 | |
| Failure to Yield Right-of-Way | 114498 | 5.9% |
| Following Too Closely | 103076 | 5.3% |
| Backing Unsafely | 72926 | 3.7% |
| Other Vehicular | 60620 | 3.1% |
| Passing or Lane Usage Improper | 52345 | 2.7% |
| Turning Improperly | 48314 | 2.5% |
| Passing Too Closely | 47509 | 2.4% |
| Fatigued/Drowsy | 47273 | 2.4% |
| Other values (51) | 336500 |
Length
| Value | Count | Frequency (%) |
| unspecified | 676629 | |
| driver | 413817 | 10.8% |
| inattention/distraction | 384021 | 10.0% |
| too | 150585 | 3.9% |
| closely | 150585 | 3.9% |
| to | 137701 | 3.6% |
| failure | 120287 | 3.1% |
| yield | 114498 | 3.0% |
| right-of-way | 114498 | 3.0% |
| following | 103076 | 2.7% |
| Other values (96) | 1478248 |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 4261437 | 11.3% |
| e | 3847900 | 10.2% |
| n | 3271990 | 8.7% |
| t | 2597964 | 6.9% |
| o | 2207352 | 5.9% |
| r | 2192939 | 5.8% |
| s | 1969901 | 5.2% |
| 1900234 | 5.0% | |
| a | 1846471 | 4.9% |
| c | 1467914 | 3.9% |
| Other values (45) | 12112185 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 30798764 | |
| Uppercase Letter | 4255134 | 11.3% |
| Space Separator | 1900234 | 5.0% |
| Other Punctuation | 487265 | 1.3% |
| Dash Punctuation | 230638 | 0.6% |
| Open Punctuation | 2020 | < 0.1% |
| Close Punctuation | 2020 | < 0.1% |
| Decimal Number | 212 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 4261437 | |
| e | 3847900 | |
| n | 3271990 | |
| t | 2597964 | |
| o | 2207352 | 7.2% |
| r | 2192939 | 7.1% |
| s | 1969901 | 6.4% |
| a | 1846471 | 6.0% |
| c | 1467914 | 4.8% |
| l | 1159451 | 3.8% |
| Other values (15) | 5975445 |
Uppercase Letter
| Value | Count | Frequency (%) |
| D | 936708 | |
| U | 884618 | |
| I | 544123 | |
| F | 278089 | 6.5% |
| C | 264871 | 6.2% |
| T | 235626 | 5.5% |
| P | 171170 | 4.0% |
| R | 156228 | 3.7% |
| L | 124678 | 2.9% |
| W | 115534 | 2.7% |
| Other values (12) | 543489 |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 101 | |
| 0 | 101 | |
| 1 | 10 | 4.7% |
Space Separator
| Value | Count | Frequency (%) |
| 1900234 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 487265 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 230638 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 2020 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 2020 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 35053898 | |
| Common | 2622389 | 7.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 4261437 | |
| e | 3847900 | 11.0% |
| n | 3271990 | 9.3% |
| t | 2597964 | 7.4% |
| o | 2207352 | 6.3% |
| r | 2192939 | 6.3% |
| s | 1969901 | 5.6% |
| a | 1846471 | 5.3% |
| c | 1467914 | 4.2% |
| l | 1159451 | 3.3% |
| Other values (37) | 10230579 |
Common
| Value | Count | Frequency (%) |
| 1900234 | ||
| / | 487265 | 18.6% |
| - | 230638 | 8.8% |
| ( | 2020 | 0.1% |
| ) | 2020 | 0.1% |
| 8 | 101 | < 0.1% |
| 0 | 101 | < 0.1% |
| 1 | 10 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 37676287 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 4261437 | 11.3% |
| e | 3847900 | 10.2% |
| n | 3271990 | 8.7% |
| t | 2597964 | 6.9% |
| o | 2207352 | 5.9% |
| r | 2192939 | 5.8% |
| s | 1969901 | 5.2% |
| 1900234 | 5.0% | |
| a | 1846471 | 4.9% |
| c | 1467914 | 3.9% |
| Other values (45) | 12112185 |
| Distinct | 61 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 291964 |
| Missing (%) | 15.0% |
| Memory size | 14.9 MiB |
| Unspecified | |
|---|---|
| Driver Inattention/Distraction | 88639 |
| Other Vehicular | 30554 |
| Following Too Closely | 17526 |
| Failure to Yield Right-of-Way | 16327 |
| Other values (56) | 108997 |
Length
| Max length | 53 |
|---|---|
| Median length | 11 |
| Mean length | 13.038235 |
| Min length | 1 |
Characters and Unicode
| Total characters | 21613039 |
|---|---|
| Distinct characters | 55 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Unspecified |
|---|---|
| 2nd row | Unspecified |
| 3rd row | Unspecified |
| 4th row | Unspecified |
| 5th row | Unspecified |
Common Values
| Value | Count | Frequency (%) |
| Unspecified | 1395623 | |
| Driver Inattention/Distraction | 88639 | 4.5% |
| Other Vehicular | 30554 | 1.6% |
| Following Too Closely | 17526 | 0.9% |
| Failure to Yield Right-of-Way | 16327 | 0.8% |
| Passing or Lane Usage Improper | 11896 | 0.6% |
| Fatigued/Drowsy | 10833 | 0.6% |
| Turning Improperly | 8458 | 0.4% |
| Passing Too Closely | 8177 | 0.4% |
| Backing Unsafely | 7679 | 0.4% |
| Other values (51) | 61954 | 3.2% |
| (Missing) | 291964 | 15.0% |
Length
| Value | Count | Frequency (%) |
| unspecified | 1395623 | |
| driver | 94955 | 4.7% |
| inattention/distraction | 88639 | 4.4% |
| other | 31611 | 1.6% |
| vehicular | 30554 | 1.5% |
| too | 25703 | 1.3% |
| closely | 25703 | 1.3% |
| to | 20497 | 1.0% |
| passing | 20073 | 1.0% |
| lane | 18795 | 0.9% |
| Other values (96) | 279382 | 13.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 3409113 | |
| e | 3316582 | |
| n | 1936211 | |
| s | 1661264 | |
| c | 1575269 | |
| d | 1464569 | |
| p | 1461288 | |
| f | 1447832 | |
| U | 1429652 | |
| t | 584113 | 2.7% |
| Other values (45) | 3327146 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 18965693 | |
| Uppercase Letter | 2127710 | 9.8% |
| Space Separator | 373869 | 1.7% |
| Other Punctuation | 111920 | 0.5% |
| Dash Punctuation | 33242 | 0.2% |
| Open Punctuation | 278 | < 0.1% |
| Close Punctuation | 278 | < 0.1% |
| Decimal Number | 49 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 3409113 | |
| e | 3316582 | |
| n | 1936211 | |
| s | 1661264 | |
| c | 1575269 | |
| d | 1464569 | |
| p | 1461288 | |
| f | 1447832 | |
| t | 584113 | 3.1% |
| r | 508927 | 2.7% |
| Other values (15) | 1600525 |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 1429652 | |
| D | 211693 | 9.9% |
| I | 119019 | 5.6% |
| C | 49185 | 2.3% |
| F | 46042 | 2.2% |
| O | 41973 | 2.0% |
| T | 41533 | 2.0% |
| V | 39291 | 1.8% |
| P | 35055 | 1.6% |
| L | 27010 | 1.3% |
| Other values (12) | 87257 | 4.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 22 | |
| 0 | 22 | |
| 1 | 5 | 10.2% |
Space Separator
| Value | Count | Frequency (%) |
| 373869 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 111920 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 33242 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 278 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 278 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 21093403 | |
| Common | 519636 | 2.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 3409113 | |
| e | 3316582 | |
| n | 1936211 | |
| s | 1661264 | |
| c | 1575269 | |
| d | 1464569 | |
| p | 1461288 | |
| f | 1447832 | |
| U | 1429652 | |
| t | 584113 | 2.8% |
| Other values (37) | 2807510 |
Common
| Value | Count | Frequency (%) |
| 373869 | ||
| / | 111920 | 21.5% |
| - | 33242 | 6.4% |
| ( | 278 | 0.1% |
| ) | 278 | 0.1% |
| 8 | 22 | < 0.1% |
| 0 | 22 | < 0.1% |
| 1 | 5 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 21613039 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 3409113 | |
| e | 3316582 | |
| n | 1936211 | |
| s | 1661264 | |
| c | 1575269 | |
| d | 1464569 | |
| p | 1461288 | |
| f | 1447832 | |
| U | 1429652 | |
| t | 584113 | 2.7% |
| Other values (45) | 3327146 |
| Distinct | 51 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1812944 |
| Missing (%) | 93.0% |
| Memory size | 14.9 MiB |
| Unspecified | |
|---|---|
| Other Vehicular | 2534 |
| Driver Inattention/Distraction | 1804 |
| Following Too Closely | 1746 |
| Fatigued/Drowsy | 853 |
| Other values (46) | 2325 |
Length
| Max length | 53 |
|---|---|
| Median length | 11 |
| Mean length | 11.655334 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1593121 |
|---|---|
| Distinct characters | 55 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Unspecified |
|---|---|
| 2nd row | Unspecified |
| 3rd row | Unspecified |
| 4th row | Unspecified |
| 5th row | Unspecified |
Common Values
| Value | Count | Frequency (%) |
| Unspecified | 127424 | 6.5% |
| Other Vehicular | 2534 | 0.1% |
| Driver Inattention/Distraction | 1804 | 0.1% |
| Following Too Closely | 1746 | 0.1% |
| Fatigued/Drowsy | 853 | < 0.1% |
| Pavement Slippery | 371 | < 0.1% |
| Reaction to Uninvolved Vehicle | 195 | < 0.1% |
| Driver Inexperience | 169 | < 0.1% |
| Outside Car Distraction | 159 | < 0.1% |
| Traffic Control Disregarded | 150 | < 0.1% |
| Other values (41) | 1281 | 0.1% |
| (Missing) | 1812944 |
Length
| Value | Count | Frequency (%) |
| unspecified | 127424 | |
| other | 2574 | 1.7% |
| vehicular | 2534 | 1.7% |
| driver | 1973 | 1.3% |
| inattention/distraction | 1804 | 1.2% |
| too | 1792 | 1.2% |
| closely | 1792 | 1.2% |
| following | 1746 | 1.2% |
| fatigued/drowsy | 853 | 0.6% |
| pavement | 385 | 0.3% |
| Other values (79) | 5456 | 3.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 272243 | |
| i | 271136 | |
| n | 139663 | |
| s | 133824 | |
| c | 133326 | |
| d | 129434 | |
| p | 128948 | |
| f | 128260 | |
| U | 128006 | |
| o | 15718 | 1.0% |
| Other values (45) | 112563 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1427288 | |
| Uppercase Letter | 150937 | 9.5% |
| Space Separator | 11647 | 0.7% |
| Other Punctuation | 2921 | 0.2% |
| Dash Punctuation | 297 | < 0.1% |
| Open Punctuation | 12 | < 0.1% |
| Close Punctuation | 12 | < 0.1% |
| Decimal Number | 7 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 272243 | |
| i | 271136 | |
| n | 139663 | |
| s | 133824 | |
| c | 133326 | |
| d | 129434 | |
| p | 128948 | |
| f | 128260 | |
| o | 15718 | 1.1% |
| t | 14904 | 1.0% |
| Other values (15) | 59832 | 4.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 128006 | |
| D | 5205 | 3.4% |
| O | 2879 | 1.9% |
| F | 2836 | 1.9% |
| V | 2795 | 1.9% |
| I | 2280 | 1.5% |
| C | 2247 | 1.5% |
| T | 2039 | 1.4% |
| P | 647 | 0.4% |
| S | 506 | 0.3% |
| Other values (12) | 1497 | 1.0% |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 3 | |
| 0 | 3 | |
| 1 | 1 | 14.3% |
Space Separator
| Value | Count | Frequency (%) |
| 11647 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 2921 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 297 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 12 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 12 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1578225 | |
| Common | 14896 | 0.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 272243 | |
| i | 271136 | |
| n | 139663 | |
| s | 133824 | |
| c | 133326 | |
| d | 129434 | |
| p | 128948 | |
| f | 128260 | |
| U | 128006 | |
| o | 15718 | 1.0% |
| Other values (37) | 97667 | 6.2% |
Common
| Value | Count | Frequency (%) |
| 11647 | ||
| / | 2921 | 19.6% |
| - | 297 | 2.0% |
| ( | 12 | 0.1% |
| ) | 12 | 0.1% |
| 8 | 3 | < 0.1% |
| 0 | 3 | < 0.1% |
| 1 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1593121 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 272243 | |
| i | 271136 | |
| n | 139663 | |
| s | 133824 | |
| c | 133326 | |
| d | 129434 | |
| p | 128948 | |
| f | 128260 | |
| U | 128006 | |
| o | 15718 | 1.0% |
| Other values (45) | 112563 |
| Distinct | 40 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1919228 |
| Missing (%) | 98.4% |
| Memory size | 14.9 MiB |
| Unspecified | |
|---|---|
| Other Vehicular | 542 |
| Following Too Closely | 340 |
| Driver Inattention/Distraction | 248 |
| Fatigued/Drowsy | 170 |
| Other values (35) | 407 |
Length
| Max length | 43 |
|---|---|
| Median length | 11 |
| Mean length | 11.483784 |
| Min length | 5 |
Characters and Unicode
| Total characters | 349130 |
|---|---|
| Distinct characters | 50 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 6 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Unspecified |
|---|---|
| 2nd row | Unspecified |
| 3rd row | Unspecified |
| 4th row | Unspecified |
| 5th row | Unspecified |
Common Values
| Value | Count | Frequency (%) |
| Unspecified | 28695 | 1.5% |
| Other Vehicular | 542 | < 0.1% |
| Following Too Closely | 340 | < 0.1% |
| Driver Inattention/Distraction | 248 | < 0.1% |
| Fatigued/Drowsy | 170 | < 0.1% |
| Pavement Slippery | 106 | < 0.1% |
| Reaction to Uninvolved Vehicle | 38 | < 0.1% |
| Outside Car Distraction | 27 | < 0.1% |
| Unsafe Speed | 26 | < 0.1% |
| Driver Inexperience | 24 | < 0.1% |
| Other values (30) | 186 | < 0.1% |
| (Missing) | 1919228 |
Length
| Value | Count | Frequency (%) |
| unspecified | 28695 | |
| other | 551 | 1.7% |
| vehicular | 542 | 1.7% |
| too | 345 | 1.1% |
| closely | 345 | 1.1% |
| following | 340 | 1.0% |
| driver | 272 | 0.8% |
| inattention/distraction | 248 | 0.8% |
| fatigued/drowsy | 170 | 0.5% |
| pavement | 109 | 0.3% |
| Other values (63) | 871 | 2.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 60593 | |
| i | 60060 | |
| n | 30541 | |
| c | 29741 | |
| s | 29724 | |
| d | 29035 | |
| p | 29018 | |
| f | 28810 | |
| U | 28783 | |
| o | 2743 | 0.8% |
| Other values (40) | 20082 | 5.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 313669 | |
| Uppercase Letter | 32883 | 9.4% |
| Space Separator | 2086 | 0.6% |
| Other Punctuation | 450 | 0.1% |
| Dash Punctuation | 34 | < 0.1% |
| Open Punctuation | 4 | < 0.1% |
| Close Punctuation | 4 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 60593 | |
| i | 60060 | |
| n | 30541 | |
| c | 29741 | |
| s | 29724 | |
| d | 29035 | |
| p | 29018 | |
| f | 28810 | |
| o | 2743 | 0.9% |
| r | 2490 | 0.8% |
| Other values (14) | 10914 | 3.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 28783 | |
| D | 794 | 2.4% |
| O | 599 | 1.8% |
| V | 585 | 1.8% |
| F | 554 | 1.7% |
| C | 403 | 1.2% |
| T | 374 | 1.1% |
| I | 313 | 1.0% |
| S | 133 | 0.4% |
| P | 130 | 0.4% |
| Other values (11) | 215 | 0.7% |
Space Separator
| Value | Count | Frequency (%) |
| 2086 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 450 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 34 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 4 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 346552 | |
| Common | 2578 | 0.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 60593 | |
| i | 60060 | |
| n | 30541 | |
| c | 29741 | |
| s | 29724 | |
| d | 29035 | |
| p | 29018 | |
| f | 28810 | |
| U | 28783 | |
| o | 2743 | 0.8% |
| Other values (35) | 17504 | 5.1% |
Common
| Value | Count | Frequency (%) |
| 2086 | ||
| / | 450 | 17.5% |
| - | 34 | 1.3% |
| ( | 4 | 0.2% |
| ) | 4 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 349130 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 60593 | |
| i | 60060 | |
| n | 30541 | |
| c | 29741 | |
| s | 29724 | |
| d | 29035 | |
| p | 29018 | |
| f | 28810 | |
| U | 28783 | |
| o | 2743 | 0.8% |
| Other values (40) | 20082 | 5.8% |
| Distinct | 29 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 1941480 |
| Missing (%) | 99.6% |
| Memory size | 14.9 MiB |
| Unspecified | |
|---|---|
| Other Vehicular | 158 |
| Following Too Closely | 81 |
| Driver Inattention/Distraction | 60 |
| Pavement Slippery | 44 |
| Other values (24) | 121 |
Length
| Max length | 43 |
|---|---|
| Median length | 11 |
| Mean length | 11.467362 |
| Min length | 5 |
Characters and Unicode
| Total characters | 93459 |
|---|---|
| Distinct characters | 49 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 10 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Unspecified |
|---|---|
| 2nd row | Unspecified |
| 3rd row | Unspecified |
| 4th row | Unspecified |
| 5th row | Unspecified |
Common Values
| Value | Count | Frequency (%) |
| Unspecified | 7686 | 0.4% |
| Other Vehicular | 158 | < 0.1% |
| Following Too Closely | 81 | < 0.1% |
| Driver Inattention/Distraction | 60 | < 0.1% |
| Pavement Slippery | 44 | < 0.1% |
| Fatigued/Drowsy | 41 | < 0.1% |
| Reaction to Uninvolved Vehicle | 11 | < 0.1% |
| Alcohol Involvement | 10 | < 0.1% |
| Driver Inexperience | 9 | < 0.1% |
| Unsafe Speed | 7 | < 0.1% |
| Other values (19) | 43 | < 0.1% |
| (Missing) | 1941480 |
Length
| Value | Count | Frequency (%) |
| unspecified | 7686 | |
| other | 160 | 1.8% |
| vehicular | 158 | 1.8% |
| too | 83 | 1.0% |
| closely | 83 | 1.0% |
| following | 81 | 0.9% |
| driver | 69 | 0.8% |
| inattention/distraction | 60 | 0.7% |
| pavement | 45 | 0.5% |
| slippery | 44 | 0.5% |
| Other values (46) | 232 | 2.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 16277 | |
| i | 16069 | |
| n | 8162 | |
| c | 7973 | |
| s | 7926 | |
| p | 7799 | |
| d | 7765 | |
| f | 7709 | |
| U | 7706 | |
| o | 677 | 0.7% |
| Other values (39) | 5396 | 5.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 83985 | |
| Uppercase Letter | 8797 | 9.4% |
| Space Separator | 551 | 0.6% |
| Other Punctuation | 111 | 0.1% |
| Dash Punctuation | 11 | < 0.1% |
| Open Punctuation | 2 | < 0.1% |
| Close Punctuation | 2 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 16277 | |
| i | 16069 | |
| n | 8162 | |
| c | 7973 | |
| s | 7926 | |
| p | 7799 | |
| d | 7765 | |
| f | 7709 | |
| o | 677 | 0.8% |
| r | 677 | 0.8% |
| Other values (14) | 2951 | 3.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 7706 | |
| D | 195 | 2.2% |
| O | 174 | 2.0% |
| V | 169 | 1.9% |
| F | 134 | 1.5% |
| C | 94 | 1.1% |
| T | 88 | 1.0% |
| I | 83 | 0.9% |
| S | 52 | 0.6% |
| P | 48 | 0.5% |
| Other values (10) | 54 | 0.6% |
Space Separator
| Value | Count | Frequency (%) |
| 551 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 111 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 11 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 2 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 92782 | |
| Common | 677 | 0.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 16277 | |
| i | 16069 | |
| n | 8162 | |
| c | 7973 | |
| s | 7926 | |
| p | 7799 | |
| d | 7765 | |
| f | 7709 | |
| U | 7706 | |
| o | 677 | 0.7% |
| Other values (34) | 4719 | 5.1% |
Common
| Value | Count | Frequency (%) |
| 551 | ||
| / | 111 | 16.4% |
| - | 11 | 1.6% |
| ( | 2 | 0.3% |
| ) | 2 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 93459 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 16277 | |
| i | 16069 | |
| n | 8162 | |
| c | 7973 | |
| s | 7926 | |
| p | 7799 | |
| d | 7765 | |
| f | 7709 | |
| U | 7706 | |
| o | 677 | 0.7% |
| Other values (39) | 5396 | 5.8% |
| Distinct | 1949630 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3063521.4 |
| Minimum | 22 |
|---|---|
| Maximum | 4586417 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.9 MiB |
Quantile statistics
| Minimum | 22 |
|---|---|
| 5-th percentile | 98314.45 |
| Q1 | 3122119.2 |
| median | 3611018.5 |
| Q3 | 4098681.8 |
| 95-th percentile | 4488695.5 |
| Maximum | 4586417 |
| Range | 4586395 |
| Interquartile range (IQR) | 976562.5 |
Descriptive statistics
| Standard deviation | 1503058.7 |
|---|---|
| Coefficient of variation (CV) | 0.49063103 |
| Kurtosis | -0.2122234 |
| Mean | 3063521.4 |
| Median Absolute Deviation (MAD) | 488282 |
| Skewness | -1.1788488 |
| Sum | 5.9727332 × 1012 |
| Variance | 2.2591853 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4455765 | 1 | < 0.1% |
| 3269862 | 1 | < 0.1% |
| 3265648 | 1 | < 0.1% |
| 3276194 | 1 | < 0.1% |
| 3266120 | 1 | < 0.1% |
| 3274634 | 1 | < 0.1% |
| 3274205 | 1 | < 0.1% |
| 3267013 | 1 | < 0.1% |
| 3266847 | 1 | < 0.1% |
| 3273878 | 1 | < 0.1% |
| Other values (1949620) | 1949620 |
| Value | Count | Frequency (%) |
| 22 | 1 | |
| 23 | 1 | |
| 24 | 1 | |
| 25 | 1 | |
| 26 | 1 | |
| 27 | 1 | |
| 28 | 1 | |
| 29 | 1 | |
| 30 | 1 | |
| 31 | 1 |
| Value | Count | Frequency (%) |
| 4586417 | 1 | |
| 4586409 | 1 | |
| 4586408 | 1 | |
| 4586407 | 1 | |
| 4586403 | 1 | |
| 4586396 | 1 | |
| 4586395 | 1 | |
| 4586394 | 1 | |
| 4586388 | 1 | |
| 4586386 | 1 |
VEHICLE TYPE CODE 1
Categorical
| Distinct | 1450 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 11591 |
| Missing (%) | 0.6% |
| Memory size | 14.9 MiB |
| Sedan | |
|---|---|
| PASSENGER VEHICLE | |
| Station Wagon/Sport Utility Vehicle | |
| SPORT UTILITY / STATION WAGON | |
| Taxi | 48192 |
| Other values (1445) |
Length
| Max length | 38 |
|---|---|
| Median length | 30 |
| Mean length | 16.938741 |
| Min length | 1 |
Characters and Unicode
| Total characters | 32827941 |
|---|---|
| Distinct characters | 75 |
| Distinct categories | 11 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 865 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Sedan |
|---|---|
| 2nd row | Sedan |
| 3rd row | Sedan |
| 4th row | Sedan |
| 5th row | Dump |
Common Values
| Value | Count | Frequency (%) |
| Sedan | 518915 | |
| PASSENGER VEHICLE | 416206 | |
| Station Wagon/Sport Utility Vehicle | 409896 | |
| SPORT UTILITY / STATION WAGON | 180291 | 9.2% |
| Taxi | 48192 | 2.5% |
| 4 dr sedan | 40135 | 2.1% |
| TAXI | 31911 | 1.6% |
| Pick-up Truck | 31836 | 1.6% |
| VAN | 25266 | 1.3% |
| OTHER | 22967 | 1.2% |
| Other values (1440) | 212424 |
Length
| Value | Count | Frequency (%) |
| vehicle | 836673 | |
| utility | 590216 | |
| station | 590187 | |
| sedan | 561711 | |
| passenger | 416215 | |
| wagon/sport | 409896 | |
| 181483 | 3.9% | |
| wagon | 180345 | 3.9% |
| sport | 180291 | 3.9% |
| taxi | 80105 | 1.7% |
| Other values (876) | 590717 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2693020 | 8.2% | |
| S | 2589747 | 7.9% |
| t | 2078915 | 6.3% |
| E | 1816789 | 5.5% |
| i | 1752600 | 5.3% |
| a | 1467890 | 4.5% |
| e | 1455005 | 4.4% |
| n | 1401162 | 4.3% |
| o | 1295948 | 3.9% |
| T | 1130789 | 3.4% |
| Other values (65) | 15146076 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 15231229 | |
| Lowercase Letter | 14138588 | |
| Space Separator | 2693020 | 8.2% |
| Other Punctuation | 591426 | 1.8% |
| Decimal Number | 70905 | 0.2% |
| Dash Punctuation | 47541 | 0.1% |
| Open Punctuation | 27615 | 0.1% |
| Close Punctuation | 27613 | 0.1% |
| Modifier Symbol | 2 | < 0.1% |
| Other Symbol | 1 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 2589747 | |
| E | 1816789 | |
| T | 1130789 | 7.4% |
| I | 1051859 | 6.9% |
| V | 909082 | 6.0% |
| A | 874102 | 5.7% |
| N | 865163 | 5.7% |
| R | 723060 | 4.7% |
| L | 667346 | 4.4% |
| P | 654550 | 4.3% |
| Other values (16) | 3948742 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 2078915 | |
| i | 1752600 | |
| a | 1467890 | |
| e | 1455005 | |
| n | 1401162 | |
| o | 1295948 | |
| l | 852499 | |
| d | 608967 | 4.3% |
| r | 570205 | 4.0% |
| c | 542813 | 3.8% |
| Other values (15) | 2112584 |
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 53373 | |
| 6 | 14402 | 20.3% |
| 2 | 2674 | 3.8% |
| 3 | 303 | 0.4% |
| 1 | 47 | 0.1% |
| 5 | 39 | 0.1% |
| 0 | 31 | < 0.1% |
| 9 | 20 | < 0.1% |
| 8 | 9 | < 0.1% |
| 7 | 7 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 591403 | |
| . | 12 | < 0.1% |
| # | 4 | < 0.1% |
| , | 3 | < 0.1% |
| ' | 2 | < 0.1% |
| & | 1 | < 0.1% |
| ? | 1 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 2693020 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 47541 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 27615 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 27613 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ` | 2 |
Other Symbol
| Value | Count | Frequency (%) |
| � | 1 |
Control
| Value | Count | Frequency (%) |
| | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 29369817 | |
| Common | 3458124 | 10.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 2589747 | 8.8% |
| t | 2078915 | 7.1% |
| E | 1816789 | 6.2% |
| i | 1752600 | 6.0% |
| a | 1467890 | 5.0% |
| e | 1455005 | 5.0% |
| n | 1401162 | 4.8% |
| o | 1295948 | 4.4% |
| T | 1130789 | 3.9% |
| I | 1051859 | 3.6% |
| Other values (41) | 13329113 |
Common
| Value | Count | Frequency (%) |
| 2693020 | ||
| / | 591403 | 17.1% |
| 4 | 53373 | 1.5% |
| - | 47541 | 1.4% |
| ( | 27615 | 0.8% |
| ) | 27613 | 0.8% |
| 6 | 14402 | 0.4% |
| 2 | 2674 | 0.1% |
| 3 | 303 | < 0.1% |
| 1 | 47 | < 0.1% |
| Other values (14) | 133 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 32827940 | |
| Specials | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2693020 | 8.2% | |
| S | 2589747 | 7.9% |
| t | 2078915 | 6.3% |
| E | 1816789 | 5.5% |
| i | 1752600 | 5.3% |
| a | 1467890 | 4.5% |
| e | 1455005 | 4.4% |
| n | 1401162 | 4.3% |
| o | 1295948 | 3.9% |
| T | 1130789 | 3.4% |
| Other values (64) | 15146075 |
Specials
| Value | Count | Frequency (%) |
| � | 1 |
| Distinct | 1622 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 353893 |
| Missing (%) | 18.2% |
| Memory size | 14.9 MiB |
| Sedan | |
|---|---|
| PASSENGER VEHICLE | |
| Station Wagon/Sport Utility Vehicle | |
| SPORT UTILITY / STATION WAGON | |
| UNKNOWN | |
| Other values (1617) |
Length
| Max length | 38 |
|---|---|
| Median length | 30 |
| Mean length | 16.145074 |
| Min length | 1 |
Characters and Unicode
| Total characters | 25763292 |
|---|---|
| Distinct characters | 72 |
| Distinct categories | 9 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 962 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Sedan |
|---|---|
| 2nd row | Pick-up Truck |
| 3rd row | Sedan |
| 4th row | Tractor Truck Diesel |
| 5th row | Sedan |
Common Values
| Value | Count | Frequency (%) |
| Sedan | 370374 | |
| PASSENGER VEHICLE | 318607 | |
| Station Wagon/Sport Utility Vehicle | 301180 | |
| SPORT UTILITY / STATION WAGON | 140204 | 7.2% |
| UNKNOWN | 81487 | 4.2% |
| Taxi | 35848 | 1.8% |
| 4 dr sedan | 30069 | 1.5% |
| Pick-up Truck | 29118 | 1.5% |
| TAXI | 27702 | 1.4% |
| Bike | 27177 | 1.4% |
| Other values (1612) | 233971 | |
| (Missing) | 353893 |
Length
| Value | Count | Frequency (%) |
| vehicle | 628358 | |
| utility | 441402 | |
| station | 441384 | |
| sedan | 402386 | |
| passenger | 318609 | |
| wagon/sport | 301180 | |
| 141364 | 3.9% | |
| wagon | 140252 | 3.8% |
| sport | 140204 | 3.8% |
| unknown | 81557 | 2.2% |
| Other values (920) | 625895 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2079819 | 8.1% | |
| S | 1945863 | 7.6% |
| t | 1534332 | 6.0% |
| E | 1435030 | 5.6% |
| i | 1318066 | 5.1% |
| e | 1090694 | 4.2% |
| a | 1076590 | 4.2% |
| n | 1021750 | 4.0% |
| o | 971714 | 3.8% |
| T | 910049 | 3.5% |
| Other values (62) | 12379385 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 12465872 | |
| Lowercase Letter | 10615612 | |
| Space Separator | 2079819 | 8.1% |
| Other Punctuation | 442606 | 1.7% |
| Decimal Number | 59094 | 0.2% |
| Dash Punctuation | 46987 | 0.2% |
| Open Punctuation | 26651 | 0.1% |
| Close Punctuation | 26649 | 0.1% |
| Modifier Symbol | 2 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1945863 | |
| E | 1435030 | |
| T | 910049 | 7.3% |
| N | 869097 | 7.0% |
| I | 841925 | 6.8% |
| V | 694860 | 5.6% |
| A | 684573 | 5.5% |
| O | 587595 | 4.7% |
| R | 577370 | 4.6% |
| U | 559621 | 4.5% |
| Other values (16) | 3359889 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 1534332 | |
| i | 1318066 | |
| e | 1090694 | |
| a | 1076590 | |
| n | 1021750 | |
| o | 971714 | |
| l | 631114 | 5.9% |
| r | 449414 | 4.2% |
| d | 439383 | 4.1% |
| c | 428223 | 4.0% |
| Other values (15) | 1654332 |
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 43049 | |
| 6 | 13694 | 23.2% |
| 2 | 1955 | 3.3% |
| 3 | 265 | 0.4% |
| 0 | 51 | 0.1% |
| 1 | 37 | 0.1% |
| 5 | 27 | < 0.1% |
| 9 | 8 | < 0.1% |
| 8 | 6 | < 0.1% |
| 7 | 2 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 442588 | |
| . | 9 | < 0.1% |
| ' | 3 | < 0.1% |
| , | 2 | < 0.1% |
| # | 2 | < 0.1% |
| ? | 2 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 2079819 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 46987 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 26651 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 26649 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ` | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 23081484 | |
| Common | 2681808 | 10.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 1945863 | 8.4% |
| t | 1534332 | 6.6% |
| E | 1435030 | 6.2% |
| i | 1318066 | 5.7% |
| e | 1090694 | 4.7% |
| a | 1076590 | 4.7% |
| n | 1021750 | 4.4% |
| o | 971714 | 4.2% |
| T | 910049 | 3.9% |
| N | 869097 | 3.8% |
| Other values (41) | 10908299 |
Common
| Value | Count | Frequency (%) |
| 2079819 | ||
| / | 442588 | 16.5% |
| - | 46987 | 1.8% |
| 4 | 43049 | 1.6% |
| ( | 26651 | 1.0% |
| ) | 26649 | 1.0% |
| 6 | 13694 | 0.5% |
| 2 | 1955 | 0.1% |
| 3 | 265 | < 0.1% |
| 0 | 51 | < 0.1% |
| Other values (11) | 100 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 25763292 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2079819 | 8.1% | |
| S | 1945863 | 7.6% |
| t | 1534332 | 6.0% |
| E | 1435030 | 5.6% |
| i | 1318066 | 5.1% |
| e | 1090694 | 4.2% |
| a | 1076590 | 4.2% |
| n | 1021750 | 4.0% |
| o | 971714 | 3.8% |
| T | 910049 | 3.5% |
| Other values (62) | 12379385 |
| Distinct | 230 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 1817465 |
| Missing (%) | 93.2% |
| Memory size | 14.9 MiB |
| Sedan | |
|---|---|
| Station Wagon/Sport Utility Vehicle | |
| PASSENGER VEHICLE | |
| SPORT UTILITY / STATION WAGON | |
| UNKNOWN | 3283 |
| Other values (225) |
Length
| Max length | 35 |
|---|---|
| Median length | 30 |
| Mean length | 17.685234 |
| Min length | 2 |
Characters and Unicode
| Total characters | 2337369 |
|---|---|
| Distinct characters | 61 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 133 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Sedan |
|---|---|
| 2nd row | Station Wagon/Sport Utility Vehicle |
| 3rd row | Sedan |
| 4th row | Sedan |
| 5th row | Sedan |
Common Values
| Value | Count | Frequency (%) |
| Sedan | 39284 | 2.0% |
| Station Wagon/Sport Utility Vehicle | 31740 | 1.6% |
| PASSENGER VEHICLE | 27713 | 1.4% |
| SPORT UTILITY / STATION WAGON | 13358 | 0.7% |
| UNKNOWN | 3283 | 0.2% |
| 4 dr sedan | 2561 | 0.1% |
| Taxi | 2053 | 0.1% |
| Pick-up Truck | 1984 | 0.1% |
| VAN | 1366 | 0.1% |
| OTHER | 1045 | 0.1% |
| Other values (220) | 7778 | 0.4% |
| (Missing) | 1817465 |
Length
| Value | Count | Frequency (%) |
| vehicle | 59889 | |
| utility | 45100 | |
| station | 45099 | |
| sedan | 42028 | |
| wagon/sport | 31740 | |
| passenger | 27715 | |
| 13429 | 4.2% | |
| sport | 13358 | 4.1% |
| wagon | 13358 | 4.1% |
| truck | 3795 | 1.2% |
| Other values (187) | 27081 |
Most occurring characters
| Value | Count | Frequency (%) |
| 190862 | 8.2% | |
| S | 186706 | 8.0% |
| t | 159926 | 6.8% |
| i | 132163 | 5.7% |
| E | 116348 | 5.0% |
| a | 108689 | 4.7% |
| e | 108172 | 4.6% |
| n | 106231 | 4.5% |
| o | 97774 | 4.2% |
| T | 76222 | 3.3% |
| Other values (51) | 1054276 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1048847 | |
| Uppercase Letter | 1044394 | |
| Space Separator | 190862 | 8.2% |
| Other Punctuation | 45170 | 1.9% |
| Decimal Number | 3634 | 0.2% |
| Dash Punctuation | 2710 | 0.1% |
| Open Punctuation | 876 | < 0.1% |
| Close Punctuation | 876 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 186706 | |
| E | 116348 | |
| T | 76222 | 7.3% |
| I | 71388 | 6.8% |
| N | 65699 | 6.3% |
| V | 62915 | 6.0% |
| A | 57887 | 5.5% |
| U | 50119 | 4.8% |
| W | 48475 | 4.6% |
| O | 46568 | 4.5% |
| Other values (15) | 262067 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 159926 | |
| i | 132163 | |
| a | 108689 | |
| e | 108172 | |
| n | 106231 | |
| o | 97774 | |
| l | 64685 | |
| d | 44937 | 4.3% |
| r | 39465 | 3.8% |
| c | 38159 | 3.6% |
| Other values (14) | 148646 |
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 2996 | |
| 6 | 442 | 12.2% |
| 2 | 184 | 5.1% |
| 3 | 9 | 0.2% |
| 0 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 190862 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 45170 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2710 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 876 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 876 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2093241 | |
| Common | 244128 | 10.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 186706 | 8.9% |
| t | 159926 | 7.6% |
| i | 132163 | 6.3% |
| E | 116348 | 5.6% |
| a | 108689 | 5.2% |
| e | 108172 | 5.2% |
| n | 106231 | 5.1% |
| o | 97774 | 4.7% |
| T | 76222 | 3.6% |
| I | 71388 | 3.4% |
| Other values (39) | 929622 |
Common
| Value | Count | Frequency (%) |
| 190862 | ||
| / | 45170 | 18.5% |
| 4 | 2996 | 1.2% |
| - | 2710 | 1.1% |
| ( | 876 | 0.4% |
| ) | 876 | 0.4% |
| 6 | 442 | 0.2% |
| 2 | 184 | 0.1% |
| 3 | 9 | < 0.1% |
| 0 | 1 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2337369 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 190862 | 8.2% | |
| S | 186706 | 8.0% |
| t | 159926 | 6.8% |
| i | 132163 | 5.7% |
| E | 116348 | 5.0% |
| a | 108689 | 4.7% |
| e | 108172 | 4.6% |
| n | 106231 | 4.5% |
| o | 97774 | 4.2% |
| T | 76222 | 3.3% |
| Other values (51) | 1054276 |
| Distinct | 91 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 1920206 |
| Missing (%) | 98.5% |
| Memory size | 14.9 MiB |
| Sedan | |
|---|---|
| Station Wagon/Sport Utility Vehicle | |
| PASSENGER VEHICLE | |
| SPORT UTILITY / STATION WAGON | |
| UNKNOWN | 595 |
| Other values (86) |
Length
| Max length | 35 |
|---|---|
| Median length | 30 |
| Mean length | 17.954051 |
| Min length | 2 |
Characters and Unicode
| Total characters | 528280 |
|---|---|
| Distinct characters | 57 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 37 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Station Wagon/Sport Utility Vehicle |
|---|---|
| 2nd row | Sedan |
| 3rd row | Station Wagon/Sport Utility Vehicle |
| 4th row | Sedan |
| 5th row | Sedan |
Common Values
| Value | Count | Frequency (%) |
| Sedan | 9375 | 0.5% |
| Station Wagon/Sport Utility Vehicle | 7627 | 0.4% |
| PASSENGER VEHICLE | 5969 | 0.3% |
| SPORT UTILITY / STATION WAGON | 2852 | 0.1% |
| UNKNOWN | 595 | < 0.1% |
| 4 dr sedan | 566 | < 0.1% |
| Pick-up Truck | 418 | < 0.1% |
| Taxi | 408 | < 0.1% |
| VAN | 242 | < 0.1% |
| OTHER | 189 | < 0.1% |
| Other values (81) | 1183 | 0.1% |
| (Missing) | 1920206 |
Length
| Value | Count | Frequency (%) |
| vehicle | 13652 | |
| station | 10479 | |
| utility | 10479 | |
| sedan | 9984 | |
| wagon/sport | 7627 | |
| passenger | 5969 | |
| 2858 | 4.0% | |
| sport | 2852 | 3.9% |
| wagon | 2852 | 3.9% |
| truck | 686 | 0.9% |
| Other values (91) | 4780 | 6.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 42850 | 8.1% | |
| S | 42513 | 8.0% |
| t | 38318 | 7.3% |
| i | 31472 | 6.0% |
| a | 25823 | 4.9% |
| e | 25619 | 4.8% |
| n | 25355 | 4.8% |
| E | 24660 | 4.7% |
| o | 23274 | 4.4% |
| T | 15902 | 3.0% |
| Other values (47) | 232494 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 248525 | |
| Uppercase Letter | 224914 | |
| Space Separator | 42850 | 8.1% |
| Other Punctuation | 10485 | 2.0% |
| Decimal Number | 725 | 0.1% |
| Dash Punctuation | 553 | 0.1% |
| Open Punctuation | 114 | < 0.1% |
| Close Punctuation | 114 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 42513 | |
| E | 24660 | |
| T | 15902 | 7.1% |
| I | 15042 | 6.7% |
| V | 14124 | 6.3% |
| N | 13718 | 6.1% |
| A | 12211 | 5.4% |
| U | 11365 | 5.1% |
| W | 11085 | 4.9% |
| O | 9648 | 4.3% |
| Other values (14) | 54646 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 38318 | |
| i | 31472 | |
| a | 25823 | |
| e | 25619 | |
| n | 25355 | |
| o | 23274 | |
| l | 15452 | |
| d | 10617 | 4.3% |
| r | 9095 | 3.7% |
| c | 8819 | 3.5% |
| Other values (13) | 34681 |
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 622 | |
| 6 | 58 | 8.0% |
| 2 | 42 | 5.8% |
| 3 | 2 | 0.3% |
| 5 | 1 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 42850 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 10485 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 553 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 114 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 114 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 473439 | |
| Common | 54841 | 10.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 42513 | 9.0% |
| t | 38318 | 8.1% |
| i | 31472 | 6.6% |
| a | 25823 | 5.5% |
| e | 25619 | 5.4% |
| n | 25355 | 5.4% |
| E | 24660 | 5.2% |
| o | 23274 | 4.9% |
| T | 15902 | 3.4% |
| l | 15452 | 3.3% |
| Other values (37) | 205051 |
Common
| Value | Count | Frequency (%) |
| 42850 | ||
| / | 10485 | 19.1% |
| 4 | 622 | 1.1% |
| - | 553 | 1.0% |
| ( | 114 | 0.2% |
| ) | 114 | 0.2% |
| 6 | 58 | 0.1% |
| 2 | 42 | 0.1% |
| 3 | 2 | < 0.1% |
| 5 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 528280 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 42850 | 8.1% | |
| S | 42513 | 8.0% |
| t | 38318 | 7.3% |
| i | 31472 | 6.0% |
| a | 25823 | 4.9% |
| e | 25619 | 4.8% |
| n | 25355 | 4.8% |
| E | 24660 | 4.7% |
| o | 23274 | 4.4% |
| T | 15902 | 3.0% |
| Other values (47) | 232494 |
| Distinct | 63 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 1941717 |
| Missing (%) | 99.6% |
| Memory size | 14.9 MiB |
| Sedan | |
|---|---|
| Station Wagon/Sport Utility Vehicle | |
| PASSENGER VEHICLE | |
| SPORT UTILITY / STATION WAGON | |
| Pick-up Truck | 137 |
| Other values (58) |
Length
| Max length | 35 |
|---|---|
| Median length | 30 |
| Mean length | 18.219133 |
| Min length | 2 |
Characters and Unicode
| Total characters | 144168 |
|---|---|
| Distinct characters | 54 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 24 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | Station Wagon/Sport Utility Vehicle |
|---|---|
| 2nd row | Station Wagon/Sport Utility Vehicle |
| 3rd row | Sedan |
| 4th row | Sedan |
| 5th row | Station Wagon/Sport Utility Vehicle |
Common Values
| Value | Count | Frequency (%) |
| Sedan | 2602 | 0.1% |
| Station Wagon/Sport Utility Vehicle | 2145 | 0.1% |
| PASSENGER VEHICLE | 1487 | 0.1% |
| SPORT UTILITY / STATION WAGON | 802 | < 0.1% |
| Pick-up Truck | 137 | < 0.1% |
| 4 dr sedan | 123 | < 0.1% |
| Taxi | 98 | < 0.1% |
| UNKNOWN | 94 | < 0.1% |
| VAN | 50 | < 0.1% |
| OTHER | 49 | < 0.1% |
| Other values (53) | 326 | < 0.1% |
| (Missing) | 1941717 |
Length
| Value | Count | Frequency (%) |
| vehicle | 3641 | |
| station | 2947 | |
| utility | 2947 | |
| sedan | 2739 | |
| wagon/sport | 2145 | |
| passenger | 1487 | |
| 804 | 4.1% | |
| wagon | 804 | 4.1% |
| sport | 802 | 4.1% |
| truck | 222 | 1.1% |
| Other values (57) | 1129 | 5.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 11764 | 8.2% | |
| S | 11517 | 8.0% |
| t | 10785 | 7.5% |
| i | 8862 | 6.1% |
| a | 7182 | 5.0% |
| e | 7138 | 5.0% |
| n | 7077 | 4.9% |
| o | 6568 | 4.6% |
| E | 6124 | 4.2% |
| T | 4465 | 3.1% |
| Other values (44) | 62686 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 69752 | |
| Uppercase Letter | 59321 | |
| Space Separator | 11764 | 8.2% |
| Other Punctuation | 2949 | 2.0% |
| Dash Punctuation | 175 | 0.1% |
| Decimal Number | 161 | 0.1% |
| Close Punctuation | 23 | < 0.1% |
| Open Punctuation | 23 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 11517 | |
| E | 6124 | |
| T | 4465 | 7.5% |
| I | 4007 | 6.8% |
| V | 3746 | 6.3% |
| N | 3428 | 5.8% |
| A | 3209 | 5.4% |
| U | 3114 | 5.2% |
| W | 3046 | 5.1% |
| O | 2624 | 4.4% |
| Other values (13) | 14041 |
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 10785 | |
| i | 8862 | |
| a | 7182 | |
| e | 7138 | |
| n | 7077 | |
| o | 6568 | |
| l | 4347 | |
| d | 2884 | 4.1% |
| r | 2557 | 3.7% |
| c | 2538 | 3.6% |
| Other values (12) | 9814 |
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 133 | |
| 2 | 14 | 8.7% |
| 6 | 13 | 8.1% |
| 3 | 1 | 0.6% |
Space Separator
| Value | Count | Frequency (%) |
| 11764 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 2949 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 175 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 23 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 23 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 129073 | |
| Common | 15095 | 10.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 11517 | 8.9% |
| t | 10785 | 8.4% |
| i | 8862 | 6.9% |
| a | 7182 | 5.6% |
| e | 7138 | 5.5% |
| n | 7077 | 5.5% |
| o | 6568 | 5.1% |
| E | 6124 | 4.7% |
| T | 4465 | 3.5% |
| l | 4347 | 3.4% |
| Other values (35) | 55008 |
Common
| Value | Count | Frequency (%) |
| 11764 | ||
| / | 2949 | 19.5% |
| - | 175 | 1.2% |
| 4 | 133 | 0.9% |
| ) | 23 | 0.2% |
| ( | 23 | 0.2% |
| 2 | 14 | 0.1% |
| 6 | 13 | 0.1% |
| 3 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 144168 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 11764 | 8.2% | |
| S | 11517 | 8.0% |
| t | 10785 | 7.5% |
| i | 8862 | 6.1% |
| a | 7182 | 5.0% |
| e | 7138 | 5.0% |
| n | 7077 | 4.9% |
| o | 6568 | 4.6% |
| E | 6124 | 4.2% |
| T | 4465 | 3.1% |
| Other values (44) | 62686 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| CRASH DATE | CRASH TIME | BOROUGH | ZIP CODE | LATITUDE | LONGITUDE | LOCATION | ON STREET NAME | CROSS STREET NAME | OFF STREET NAME | NUMBER OF PERSONS INJURED | NUMBER OF PERSONS KILLED | NUMBER OF PEDESTRIANS INJURED | NUMBER OF PEDESTRIANS KILLED | NUMBER OF CYCLIST INJURED | NUMBER OF CYCLIST KILLED | NUMBER OF MOTORIST INJURED | NUMBER OF MOTORIST KILLED | CONTRIBUTING FACTOR VEHICLE 1 | CONTRIBUTING FACTOR VEHICLE 2 | CONTRIBUTING FACTOR VEHICLE 3 | CONTRIBUTING FACTOR VEHICLE 4 | CONTRIBUTING FACTOR VEHICLE 5 | COLLISION_ID | VEHICLE TYPE CODE 1 | VEHICLE TYPE CODE 2 | VEHICLE TYPE CODE 3 | VEHICLE TYPE CODE 4 | VEHICLE TYPE CODE 5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 09/11/2021 | 2:39 | NaN | NaN | NaN | NaN | NaN | WHITESTONE EXPRESSWAY | 20 AVENUE | NaN | 2.0 | 0.0 | 0 | 0 | 0 | 0 | 2 | 0 | Aggressive Driving/Road Rage | Unspecified | NaN | NaN | NaN | 4455765 | Sedan | Sedan | NaN | NaN | NaN |
| 1 | 03/26/2022 | 11:45 | NaN | NaN | NaN | NaN | NaN | QUEENSBORO BRIDGE UPPER | NaN | NaN | 1.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | Pavement Slippery | NaN | NaN | NaN | NaN | 4513547 | Sedan | NaN | NaN | NaN | NaN |
| 2 | 06/29/2022 | 6:55 | NaN | NaN | NaN | NaN | NaN | THROGS NECK BRIDGE | NaN | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Following Too Closely | Unspecified | NaN | NaN | NaN | 4541903 | Sedan | Pick-up Truck | NaN | NaN | NaN |
| 3 | 09/11/2021 | 9:35 | BROOKLYN | 11208.0 | 40.667202 | -73.866500 | (40.667202, -73.8665) | NaN | NaN | 1211 LORING AVENUE | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | NaN | NaN | NaN | NaN | 4456314 | Sedan | NaN | NaN | NaN | NaN |
| 4 | 12/14/2021 | 8:13 | BROOKLYN | 11233.0 | 40.683304 | -73.917274 | (40.683304, -73.917274) | SARATOGA AVENUE | DECATUR STREET | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | 4486609 | NaN | NaN | NaN | NaN | NaN |
| 5 | 04/14/2021 | 12:47 | NaN | NaN | NaN | NaN | NaN | MAJOR DEEGAN EXPRESSWAY RAMP | NaN | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | NaN | NaN | NaN | 4407458 | Dump | Sedan | NaN | NaN | NaN |
| 6 | 12/14/2021 | 17:05 | NaN | NaN | 40.709183 | -73.956825 | (40.709183, -73.956825) | BROOKLYN QUEENS EXPRESSWAY | NaN | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Passing Too Closely | Unspecified | NaN | NaN | NaN | 4486555 | Sedan | Tractor Truck Diesel | NaN | NaN | NaN |
| 7 | 12/14/2021 | 8:17 | BRONX | 10475.0 | 40.868160 | -73.831480 | (40.86816, -73.83148) | NaN | NaN | 344 BAYCHESTER AVENUE | 2.0 | 0.0 | 0 | 0 | 0 | 0 | 2 | 0 | Unspecified | Unspecified | NaN | NaN | NaN | 4486660 | Sedan | Sedan | NaN | NaN | NaN |
| 8 | 12/14/2021 | 21:10 | BROOKLYN | 11207.0 | 40.671720 | -73.897100 | (40.67172, -73.8971) | NaN | NaN | 2047 PITKIN AVENUE | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inexperience | Unspecified | NaN | NaN | NaN | 4487074 | Sedan | NaN | NaN | NaN | NaN |
| 9 | 12/14/2021 | 14:58 | MANHATTAN | 10017.0 | 40.751440 | -73.973970 | (40.75144, -73.97397) | 3 AVENUE | EAST 43 STREET | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Passing Too Closely | Unspecified | NaN | NaN | NaN | 4486519 | Sedan | Station Wagon/Sport Utility Vehicle | NaN | NaN | NaN |
| CRASH DATE | CRASH TIME | BOROUGH | ZIP CODE | LATITUDE | LONGITUDE | LOCATION | ON STREET NAME | CROSS STREET NAME | OFF STREET NAME | NUMBER OF PERSONS INJURED | NUMBER OF PERSONS KILLED | NUMBER OF PEDESTRIANS INJURED | NUMBER OF PEDESTRIANS KILLED | NUMBER OF CYCLIST INJURED | NUMBER OF CYCLIST KILLED | NUMBER OF MOTORIST INJURED | NUMBER OF MOTORIST KILLED | CONTRIBUTING FACTOR VEHICLE 1 | CONTRIBUTING FACTOR VEHICLE 2 | CONTRIBUTING FACTOR VEHICLE 3 | CONTRIBUTING FACTOR VEHICLE 4 | CONTRIBUTING FACTOR VEHICLE 5 | COLLISION_ID | VEHICLE TYPE CODE 1 | VEHICLE TYPE CODE 2 | VEHICLE TYPE CODE 3 | VEHICLE TYPE CODE 4 | VEHICLE TYPE CODE 5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1949620 | 11/28/2022 | 9:20 | BROOKLYN | 11206.0 | 40.699790 | -73.950096 | (40.69979, -73.950096) | MARCY AVENUE | FLUSHING AVENUE | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Passing or Lane Usage Improper | Driver Inattention/Distraction | Unspecified | NaN | NaN | 4586261 | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Bus | NaN | NaN |
| 1949621 | 11/29/2022 | 21:41 | QUEENS | 11354.0 | 40.764650 | -73.823494 | (40.76465, -73.823494) | NORTHERN BOULEVARD | PARSONS BOULEVARD | NaN | 1.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | Failure to Yield Right-of-Way | Unspecified | NaN | NaN | NaN | 4585945 | Station Wagon/Sport Utility Vehicle | Motorcycle | NaN | NaN | NaN |
| 1949622 | 11/29/2022 | 13:05 | QUEENS | 11434.0 | 40.667522 | -73.780630 | (40.667522, -73.78063) | NORTH CONDUIT AVENUE | ROCKAWAY BOULEVARD | NaN | 1.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | Unsafe Lane Changing | Unsafe Speed | NaN | NaN | NaN | 4586024 | Station Wagon/Sport Utility Vehicle | Sedan | NaN | NaN | NaN |
| 1949623 | 11/13/2022 | 14:45 | NaN | NaN | NaN | NaN | NaN | TRIBOROUGH BRIDGE | NaN | NaN | 5.0 | 0.0 | 0 | 0 | 0 | 0 | 5 | 0 | Following Too Closely | Following Too Closely | Unspecified | NaN | NaN | 4586350 | Sedan | Station Wagon/Sport Utility Vehicle | Taxi | NaN | NaN |
| 1949624 | 11/29/2022 | 15:22 | NaN | NaN | 40.630820 | -73.886360 | (40.63082, -73.88636) | ROCKAWAY PARKWAY | SHORE PARKWAY | NaN | 2.0 | 0.0 | 0 | 0 | 0 | 0 | 2 | 0 | Unspecified | Unspecified | Unspecified | NaN | NaN | 4586083 | Station Wagon/Sport Utility Vehicle | Taxi | Station Wagon/Sport Utility Vehicle | NaN | NaN |
| 1949625 | 11/29/2022 | 2:20 | STATEN ISLAND | 10305.0 | 40.611940 | -74.070380 | (40.61194, -74.07038) | NaN | NaN | 255 HYLAN BOULEVARD | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | NaN | NaN | NaN | NaN | 4585934 | Sedan | NaN | NaN | NaN | NaN |
| 1949626 | 11/29/2022 | 15:05 | BROOKLYN | 11220.0 | 40.639854 | -74.012200 | (40.639854, -74.0122) | 57 STREET | 6 AVENUE | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | NaN | NaN | NaN | 4586337 | Sedan | Distributo | NaN | NaN | NaN |
| 1949627 | 11/24/2022 | 22:00 | NaN | NaN | 40.812073 | -73.936040 | (40.812073, -73.93604) | EAST 135 STREET | MADISON AVENUE | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | NaN | NaN | NaN | 4586345 | Station Wagon/Sport Utility Vehicle | NaN | NaN | NaN | NaN |
| 1949628 | 10/18/2022 | 15:00 | NaN | NaN | 40.797035 | -73.929825 | (40.797035, -73.929825) | EAST 120 STREET | PLEASANT AVENUE | NaN | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | NaN | NaN | NaN | NaN | 4586360 | Station Wagon/Sport Utility Vehicle | NaN | NaN | NaN | NaN |
| 1949629 | 11/29/2022 | 6:25 | BROOKLYN | 11210.0 | 40.625275 | -73.946610 | (40.625275, -73.94661) | NaN | NaN | 2442 NOSTRAND AVENUE | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | NaN | NaN | 4585982 | Sedan | Station Wagon/Sport Utility Vehicle | Sedan | NaN | NaN |